Two-Way ANOVA

Introduction to Two-Way ANOVA

You can use the functionanova2to perform a balanced two-way analysis of variance (ANOVA). To perform two-way ANOVA for an unbalanced design, useanovan. For an example, seeTwo-Way ANOVA for Unbalanced Design.

As in one-way ANOVA, the data for a two-way ANOVA study can be experimental or observational. The difference between one-way and two-way ANOVA is that in two-way ANOVA, the effects of two factors on a response variable are of interest. These two factors can be independent, and have no interaction effect, or the impact of one factor on the response variable can depend on the group (level) of the other factor. If the two factors have no interactions, the model is called anadditivemodel.

Suppose an automobile company has two factories, and each factory makes the same three car models. The gas mileage in the cars can vary from factory to factory and from model to model. These two factors, factory and model, explain the differences in mileage, that is, the response. One measure of interest is the difference in mileage due to the production methods between factories. Another measure of interest is the difference in the mileage of the models (irrespective of the factory) due to different design specifications. The effects of these measures of interest areadditive. In addition, suppose only one model has different gas mileage between factories, while the mileage of the other two models is the same between factories. This is called aninteractioneffect. To measure an interaction effect, there must be multiple observations for some combination of factory and car model. These multiple observations are called复制.

Two-way ANOVA is a special case of thelinear model. The two-way ANOVA form of the model is

$y_{i j r} = μ + α_{i} + β_{j} + {(α β)}_{i j} + ε_{i j r}$

where,

y_ijris an observation of the response variable.
- irepresents groupiof row factorA,i= 1, 2, ...,I.
- jrepresents groupjof column factorB,j= 1, 2, ...,J.
- rrepresents the replication number,r= 1, 2, ...,R.
的re are a total ofN=I*J*Robservations.
μis the overall mean.
α_iare the deviations of groups defined by row factorAfrom the overall meanμ. The values ofα_isum to 0.

$\sum_{i = 1}^{I} α_{i} = 0.$
β_jare the deviations of groups defined by column factorBfrom the overall meanμ. The values ofβ_jsum to 0.

$\sum_{j = 1}^{J} β_{j} = 0.$
αβ_ijare the interactions. The values in each row and in each column ofαβ_ijsum to 0.

$\sum_{i = 1}^{I} {(α β)}_{i j} = \sum_{j = 1}^{J} {(α β)}_{i j} = 0.$
ε_ijrare the random disturbances. They are assumed to be independent, normally distributed, and have constant variance.

In the mileage example:

y_ijrare the gas mileage observations,μis the overall mean gas mileage.
α_iare the deviations of each car's gas mileage from the mean gas mileageμdue to the car'smodel.
β_jare the deviations of each car's gas mileage from the mean gas mileageμdue to the car'sfactory.

anova2requires that data be balanced, so each combination of model and factory must have the same number of cars.

Two-way ANOVA tests hypotheses about the effects of factorsAandB, and their interaction on the response variabley. The hypotheses about the equality of the mean response for groups of row factorAare

$\begin{array}{l} H_{0} : α_{1} = α_{2} \dots = α_{I} \\ H_{1} : at least one α_{i} is different, i = 1, 2, ..., I . \end{array}$

的hypotheses about the equality of the mean response for groups of column factorBare

$\begin{array}{l} H_{0} : β_{1} = β_{2} = \dots = β_{J} \\ H_{1} : at least one β_{j} is different, j = 1, 2, ..., J . \end{array}$

的hypotheses about the interaction of the column and row factors are

$\begin{array}{l} H_{0} : {(α β)}_{i j} = 0 \\ H_{1} : at least one {(α β)}_{i j} \neq 0 \end{array}$

Prepare Data for Balanced Two-Way ANOVA

To perform balanced two-way ANOVA usinganova2, you must arrange data in a specific matrix form. The columns of the matrix must correspond to groups of the column factor,B. The rows must correspond to the groups of the row factor,A, with the same number of replications for each combination of the groups of factorsAandB.

Suppose that row factorA有三个组,和列因素呢Bhas two groups (levels). Also suppose that each combination of factorsAandBhas two measurements or observations (reps = 2). Then, each group of factorAhas six observations and each group of factorBfour observations.

$\begin{matrix} \begin{matrix} B = 1 & B = 2 \end{matrix} \\ [\begin{matrix} y_{111} & y_{121} \\ y_{112} & y_{122} \\ y_{211} & y_{221} \\ y_{212} & y_{222} \\ y_{311} & y_{321} \\ y_{312} & y_{322} \end{matrix}] \end{matrix} \begin{matrix} \begin{matrix} \begin{matrix} \end{matrix}} A = 1 \\ \begin{matrix} \end{matrix}} A = 2 \\ \begin{matrix} \end{matrix}} A = 3 \end{matrix} \end{matrix}$

的subscripts indicate row, column, and replication, respectively. For example,y₂₂₁corresponds to the measurement for the second group of factorA, the second group of factorB, and the first replication for this combination.

Perform Two-Way ANOVA

Open Live Script

This example shows how to perform two-way ANOVA to determine the effect of car model and factory on the mileage rating of cars.

Load and display the sample data.

loadmileagemileage

mileage =6×333.3000 34.5000 37.4000 33.4000 34.8000 36.8000 32.9000 33.8000 37.6000 32.6000 33.4000 36.6000 32.5000 33.7000 37.0000 33.0000 33.9000 36.7000

的re are three car models (columns) and two factories (rows). The data has six mileage rows because each factory provided three cars of each model for the study (i.e., the replication number is three). The data from the first factory is in the first three rows, and the data from the second factory is in the last three rows.

Perform two-way ANOVA. Return the structure of statistics,stats, to use in multiple comparisons.

nmbcars = 3;% Number of cars from each model, i.e., number of replications[~,~,stats] = anova2(mileage,nmbcars);

Figure Two-way ANOVA contains objects of type uicontrol.

You can use theF-statistics to do hypotheses tests to find out if the mileage is the same across models, factories, and model - factory pairs. Before performing these tests, you must adjust for the additive effects.anova2returns thep-value from these tests.

的p-value for the model effect (Columns) is zero to four decimal places. This result is a strong indication that the mileage varies from one model to another.

的p-value for the factory effect (Rows) is 0.0039, which is also highly significant. This value indicates that one factory is out-performing the other in the gas mileage of the cars it produces. The observedp-value indicates that anF-statistic as extreme as the observedFoccurs by chance about four out of 1000 times, if the gas mileage were truly equal from factory to factory.

的factories and models appear to have no interaction. Thep-value, 0.8411, means that the observed result is likely (84 out of 100 times), given that there is no interaction.

PerformMultiple Comparisonsto find out which pair of the three car models is significantly different.

c = multcompare(stats);

Note: Your model includes an interaction term. A test of main effects can be difficult to interpret when the model includes interactions.

Figure Multiple comparison of column means contains an axes object. The axes object with title Click on the group you want to test contains 7 objects of type line.

In the figure, the blue bar is the comparison interval for the mean mileage of the first car model. The red bars are the comparison intervals for the mean mileage of the second and third car models. None of the second and third comparison intervals overlap with the first comparison interval, indicating that the mean mileage of the first car model is different from the mean mileage of the second and the third car models. If you click on one of the other bars, you can test for the other car models. None of the comparison intervals overlap, indicating that the mean mileage of each car model is significantly different from the other two.

Display the multiple comparison results in a table.

资源描述= array2table(c,"VariableNames",...["Group A","Group B","Lower Limit","A-B","Upper Limit","P-value"])

资源描述=3×6 tableGroup A Group B Lower Limit A-B Upper Limit P-value _______ _______ ___________ _______ ___________ __________ 1 2 -1.5865 -1.0667 -0.54686 0.00038574 1 3 -4.5865 -4.0667 -3.5469 1.7898e-10 2 3 -3.5198 -3 -2.4802 7.8407e-09

In the matrixc, the first two columns show the pairs of car models that are compared. The last column shows thep值的测试。所有p-values are small, which indicates that the mean mileage of all car models are significantly different from each other.

Mathematical Details

的two-factor ANOVA partitions the total variation into the following components:

Variation of row factor group means from the overall mean, ${\bar{y}}_{i ..} - {\bar{y}}_{...}$
Variation of column factor group means from the overall mean, ${\bar{y}}_{. j .} - {\bar{y}}_{...}$
Variation of overall mean plus the replication mean from the column factor group mean plus row factor group mean, ${\bar{y}}_{i j .} - {\bar{y}}_{i ..} - {\bar{y}}_{. j .} + {\bar{y}}_{...}$
Variation of observations from the replication means, $y_{i j k} - {\bar{y}}_{i j .}$

ANOVA partitions the total sum of squares (SST) into the sum of squares due to row factorA(SS_A), the sum of squares due to column factorB(SS_B), the sum of squares due to interaction betweenAandB(SS_AB), and the sum of squares error (SSE).

$\begin{array}{l} \underset{S S T}{\underset{︸}{\sum_{i = 1}^{m} \sum_{j = 1}^{k} \sum_{r = 1}^{R} {(y_{i j k} - {\bar{y}}_{...})}^{2}}} = \underset{S S_{B}}{\underset{︸}{k R \sum_{i = 1}^{m} {({\bar{y}}_{i ..} - {\bar{y}}_{...})}^{2}}} + \underset{S S_{A}}{\underset{︸}{m R \sum_{j = 1}^{k} {({\bar{y}}_{. j .} - {\bar{y}}_{...})}^{2}}} \\ + \underset{S S_{A B}}{\underset{︸}{R \sum_{i = 1}^{m} \sum_{j = 1}^{k} {({\bar{y}}_{i j .} - {\bar{y}}_{i ..} - {\bar{y}}_{. j .} + {\bar{y}}_{...})}^{2}}} + \underset{S S E}{\underset{︸}{\sum_{i = 1}^{m} \sum_{j = 1}^{k} \sum_{r = 1}^{R} {(y_{i j k} - {\bar{y}}_{i j .})}^{2}}} \end{array}$

ANOVA takes the variation due to the factor or interaction and compares it to the variation due to error. If the ratio of the two variations is high, then the effect of the factor or the interaction effect is statistically significant. You can measure the statistical significance using a test statistic that has anF-distribution.

For the null hypothesis that the mean response for groups of the row factorA是相等的,test statistic is

$F = \frac{\frac{S S_{B}}{m - 1}}{\frac{S S E}{m k (R - 1)}} \sim F_{m - 1, m k (R - 1)} .$

For the null hypothesis that the mean response for groups of the column factorB是相等的,test statistic is

$F = \frac{\frac{S S_{A}}{k - 1}}{\frac{S S E}{m k (R - 1)}} \sim F_{k - 1, m k (R - 1)} .$

For the null hypothesis that the interaction of the column and row factors are equal to zero, the test statistic is

$F = \frac{\frac{S S_{A B}}{(m - 1) (k - 1)}}{\frac{S S E}{m k (R - 1)}} \sim F_{(m - 1) (k - 1), m k (R - 1)} .$

If thep-value for theF-statistic is smaller than the significance level, then ANOVA rejects the null hypothesis. The most common significance levels are 0.01 and 0.05.

ANOVA Table

的ANOVA table captures the variability in the model by the source, theF-statistic for testing the significance of this variability, and thep-value for deciding on the significance of this variability. Thep-value returned byanova2depends on assumptions about the random disturbances,ε_ij, in the model equation. For thep-value to be correct, these disturbances need to be independent, normally distributed, and have constant variance. The standard ANOVA table has this form:

anova2returns the standard ANOVA table as a cell array with six columns.

Column	Definition
`Source`	的source of the variability.
`SS`	的sum of squares due to each source.
`df`	的degrees of freedom associated with each source. SupposeJis the number of groups in the column factor,Iis the number of groups in the row factor, andRis the number of replications. Then, the total number of observations isIJRand the total degrees of freedom isIJR– 1.I– 1 is the degrees of freedom for the row factor,J– 1 is the degrees of freedom for the column factor, (I– 1)(J– 1) is the interaction degrees of freedom, andIJ(R– 1) is the error degrees of freedom.
`MS`	的mean squares for each source, which is the ratio`SS/df`.
`F`	F-statistic, which is the ratio of the mean squares.
`Prob>F`	的p值,也就是概率F-statistic can take a value larger than the computed test-statistic value.`anova2`derives this probability from the cdf of theF-distribution.

的rows of the ANOVA table show the variability in the data that is divided by the source.

Row (Source)	Definition
`Columns`	Variability due to the column factor
`Rows`	Variability due to the row factor
`Interaction`	Variability due to the interaction of the row and column factors
`Error`	Variability due to the differences between the data in each group and the group mean (variabilitywithingroups)
`Total`	Total variability

References

[1] Wu, C. F. J., and M. Hamada.Experiments: Planning, Analysis, and Parameter Design Optimization, 2000.

[2] Neter, J., M. H. Kutner, C. J. Nachtsheim, and W. Wasserman. 4th ed.Applied Linear Statistical Models. Irwin Press, 1996.

Related Examples

Two-Way ANOVA for Unbalanced Design