anova1
One-way analysis of variance
Syntax
Description
performsone-way ANOVAfor the sample datap
= anova1(y
)y
and returns thep价值。anova1
treats each column ofy
as a separate group. The function tests the hypothesis that the samples in the columns ofy
are drawn from populations with the same mean against the alternative hypothesis that the population means are not all the same. The function also displays thebox plotfor each group iny
and the standard ANOVA table (tbl
).
enables the ANOVA table and box plot displays whenp
= anova1(y
,group
,displayopt
)displayopt
is'on'
(default) and suppresses the displays whendisplayopt
is'off'
.
[
returns a structure,p
,tbl
,stats
] = anova1(___)stats
, which you can use to perform amultiple comparison test. A multiple comparison test enables you to determine which pairs of group means are significantly different. To perform this test, usemultcompare
, providing thestats
structure as an input argument.
Examples
One-Way ANOVA
Create sample data matrixy
with columns that are constants, plus random normal disturbances with mean 0 and standard deviation 1.
y = meshgrid(1:5); rngdefault;% For reproducibilityy = y + normrnd(0,1,5,5)
y =5×51.5377 0.6923 1.6501 3.7950 5.6715 2.8339 1.5664 6.0349 3.8759 3.7925 -1.2588 2.3426 3.7254 5.4897 5.7172 1.8622 5.5784 2.9369 5.4090 6.6302 1.3188 4.7694 3.7147 5.4172 5.4889
Perform one-way ANOVA.
p = anova1(y)
p = 0.0023
The ANOVA table shows the between-groups variation (Columns
) and within-groups variation (Error
).SS
is the sum of squares, anddf
的自由度。总度freedom is total number of observations minus one, which is 25 - 1 = 24. The between-groups degrees of freedom is number of groups minus one, which is 5 - 1 = 4. The within-groups degrees of freedom is total degrees of freedom minus the between groups degrees of freedom, which is 24 - 4 = 20.
MS
is the mean squared error, which isSS/df
for each source of variation. TheF-statistic is the ratio of the mean squared errors (13.4309/2.2204). Thep-value is the probability that the test statistic can take a value greater than the value of the computed test statistic, i.e., P(F > 6.05). The smallp-value of 0.0023 indicates that differences between column means are significant.
Compare Beam Strength Using One-Way ANOVA
Input the sample data.
strength = [82 86 79 83 84 85 86 87 74 82...78 75 76 77 79 79 77 78 82 79]; alloy = {'st','st','st','st','st','st','st','st',...'al1','al1','al1','al1','al1','al1',...'al2','al2','al2','al2','al2','al2'};
The data are from a study of the strength of structural beams in Hogg (1987). The vector strength measures deflections of beams in thousandths of an inch under 3000 pounds of force. The vector alloy identifies each beam as steel ('st'
), alloy 1 ('al1'
), or alloy 2 ('al2'
). Although alloy is sorted in this example, grouping variables do not need to be sorted.
Test the null hypothesis that the steel beams are equal in strength to the beams made of the two more expensive alloys. Turn the figure display off and return the ANOVA results in a cell array.
[p,tbl] = anova1(strength,alloy,'off')
p = 1.5264e-04
tbl=4×6 cell array列1到5{‘源’}{“党卫军”}{“df”}{女士' } {'F' } {'Groups'} {[184.8000]} {[ 2]} {[ 92.4000]} {[ 15.4000]} {'Error' } {[102.0000]} {[17]} {[ 6.0000]} {0x0 double} {'Total' } {[286.8000]} {[19]} {0x0 double} {0x0 double} Column 6 {'Prob>F' } {[1.5264e-04]} {0x0 double } {0x0 double }
总度freedom is total number of observations minus one, which is . The between-groups degrees of freedom is number of groups minus one, which is . The within-groups degrees of freedom is total degrees of freedom minus the between groups degrees of freedom, which is .
MS
is the mean squared error, which isSS/df
for each source of variation. TheF-statistic is the ratio of the mean squared errors. Thep-value is the probability that the test statistic can take a value greater than or equal to the value of the test statistic. Thep-value of 1.5264e-04 suggests rejection of the null hypothesis.
You can retrieve the values in the ANOVA table by indexing into the cell array. Save theF-statistic value and thep-value in the new variablesFstat
andpvalue
.
Fstat = tbl{2,5}
Fstat = 15.4000
pvalue = tbl{2,6}
pvalue = 1.5264e-04
Multiple Comparisons for One-Way ANOVA
Input the sample data.
strength = [82 86 79 83 84 85 86 87 74 82...78 75 76 77 79 79 77 78 82 79]; alloy = {'st','st','st','st','st','st','st','st',...'al1','al1','al1','al1','al1','al1',...'al2','al2','al2','al2','al2','al2'};
The data are from a study of the strength of structural beams in Hogg (1987). The vector strength measures deflections of beams in thousandths of an inch under 3000 pounds of force. The vector alloy identifies each beam as steel (st
), alloy 1 (al1
), or alloy 2 (al2
). Although alloy is sorted in this example, grouping variables do not need to be sorted.
Perform one-way ANOVA usinganova1
. Return the structurestats
, which contains the statisticsmultcompare
needs for performingMultiple Comparisons.
[~,~,stats] = anova1(strength,alloy);
The smallp-value of 0.0002 suggests that the strength of the beams is not the same.
Perform a multiple comparison of the mean strength of the beams.
[c,~,~,gnames] = multcompare(stats);
In the figure, the blue bar represents the comparison interval for mean material strength for steel. The red bars represent the comparison intervals for the mean material strength for alloy 1 and alloy 2. Neither of the red bars overlaps with the blue bar, which indicates that the mean material strength for steel is significantly different from that of alloy 1 and alloy 2. You can confirm the significant difference by clicking the bars that represent alloy 1 and 2.
Display the multiple comparison results and the corresponding group names in a table.
tbl = array2table(c,"VariableNames",...["Group A","Group B","Lower Limit","A-B","Upper Limit","P-value"]); tbl.("Group A") = gnames(tbl.("Group A")); tbl.("Group B") = gnames(tbl.("Group B"))
tbl=3×6 tableGroup A Group B Lower Limit A-B Upper Limit P-value _______ _______ ___________ ___ ___________ __________ {'st' } {'al1'} 3.6064 7 10.394 0.00016831 {'st' } {'al2'} 1.6064 5 8.3936 0.0040464 {'al1'} {'al2'} -5.628 -2 1.628 0.35601
The first two columns show the pair of groups that are compared. The fourth column shows the difference between the estimated group means. The third and fifth columns show the lower and upper limits for the 95% confidence intervals of the true difference of means. The sixth column shows thep-value for a hypothesis that the true difference of means for the corresponding groups is equal to zero.
The first two rows show that both comparisons involving the first group (steel) have confidence intervals that do not include zero. Because the correspondingp-values (1.6831e-04 and 0.0040, respectively) are small, those differences are significant.
The third row shows that the differences in strength between the two alloys is not significant. A 95% confidence interval for the difference is [-5.6,1.6], so you cannot reject the hypothesis that the true difference is zero. The correspondingp-value of 0.3560 in the sixth column confirms this result.
Input Arguments
y
—sample data
vector|matrix
Sample data, specified as a vector or matrix.
If
y
is a vector, you must specify thegroup
input argument. Each element ingroup
represents a group name of the corresponding element iny
. Theanova1
function treats they
values corresponding to the same value ofgroup
as part of the same group. Use this design when groups have different numbers of elements (unbalanced ANOVA).If
y
is a matrix and you do not specifygroup
, thenanova1
treats each column ofy
as a separate group. In this design, the function evaluates whether the population means of the columns are equal. Use this design when each group has the same number of elements (balanced ANOVA).If
y
is a matrix and you specifygroup
, then each element ingroup
represents a group name for the corresponding column iny
. Theanova1
function treats the columns that have the same group name as part of the same group.
Note
anova1
ignores anyNaN
values iny
. Also, ifgroup
contains empty orNaN
values,anova1
ignores the corresponding observations iny
. Theanova1
function performs balanced ANOVA if each group has the same number of observations after the function disregards empty orNaN
values. Otherwise,anova1
performs unbalanced ANOVA.
Data Types:single
|double
group
—Grouping variable
numeric vector|logical vector|categorical vector|character array|string array|cell array of character vectors
Grouping variable containing group names, specified as a numeric vector, logical vector, categorical vector, character array, string array, or cell array of character vectors.
If
y
is a vector, then each element ingroup
represents a group name of the corresponding element iny
. Theanova1
function treats they
values corresponding to the same value ofgroup
as part of the same group.Nis the total number of observations.
If
y
is a matrix, then each element ingroup
represents a group name for the corresponding column iny
. Theanova1
function treats the columns ofy
that have the same group name as part of the same group.If you do not want to specify group names for the matrix sample data
y
, enter an empty array ([]
) or omit this argument. In this case,anova1
treats each column ofy
as a separate group.
Ifgroup
contains empty orNaN
values,anova1
ignores the corresponding observations iny
.
For more information on grouping variables, seeGrouping Variables.
Example:'group',[1,2,1,3,1,...,3,1]
wheny
is a vector with observations categorized into groups 1, 2, and 3
Example:'group',{'white','red','white','black','red'}
wheny
is a matrix with five columns categorized into groups red, white, and black
Data Types:single
|double
|logical
|categorical
|char
|string
|cell
displayopt
—Indicator to display ANOVA table and box plot
'on'
(default) |'off'
Indicator to display the ANOVA table and box plot, specified as'on'
or'off'
. Whendisplayopt
is'off'
,anova1
returns the output arguments, only. It does not display the standard ANOVA table and box plot.
Example:p = anova(x,group,'off')
Output Arguments
p
—p-value for theF-test
scalar value
p-value for theF-test, returned as a scalar value.p-value is the probability that theF-statistic can take a value larger than the computed test-statistic value.anova1
tests the null hypothesis that all group means are equal to each other against the alternative hypothesis that at least one group mean is different from the others. The function derives thep-value from the cdf of theF-distribution.
Ap-value that is smaller than the significance level indicates that at least one of the sample means is significantly different from the others. Common significance levels are 0.05 or 0.01.
tbl
— ANOVA table
cell array
ANOVA table, returned as a cell array.tbl
has six columns.
Column | Definition |
---|---|
source |
The source of the variability. |
SS |
The sum of squares due to each source. |
df |
The degrees of freedom associated with each source. SupposeNis the total number of observations andkis the number of groups. Then,N–kis the within-groups degrees of freedom (Error ),k– 1 is the between-groups degrees of freedom (Columns ), andN– 1 is the total degrees of freedom.N– 1 = (N–k) + (k– 1) |
MS |
The mean squares for each source, which is the ratioSS/df . |
F |
F-statistic, which is the ratio of the mean squares. |
Prob>F |
Thep值,也就是概率F-statistic can take a value larger than the computed test-statistic value.anova1 derives this probability from the cdf ofF-distribution. |
The rows of the ANOVA table show the variability in the data that is divided by the source.
Row | Definition |
---|---|
Groups |
Variability due to the differences among the group means (variabilitybetweengroups) |
Error |
Variability due to the differences between the data in each group and the group mean (variabilitywithingroups) |
Total |
Total variability |
stats
— Statistics for multiple comparison tests
structure
Statistics formultiple comparison tests, returned as a structure with the fields described in this table.
Field name | Definition |
---|---|
gnames |
Names of the groups |
n |
Number of observations in each group |
source |
Source of thestats output |
means |
Estimated values of the means |
df |
Error (within-groups) degrees of freedom (N–k, whereNis the total number of observations andkis the number of groups) |
s |
Square root of the mean squared error |
More About
Box Plot
anova1
returns a box plot of the observations for each group iny
. Box plots provide a visual comparison of the group location parameters.
On each box, the central mark is the median (2nd quantile,q2) and the edges of the box are the 25th and 75th percentiles (1st and 3rd quantiles,q1andq3, respectively). The whiskers extend to the most extreme data points that are not considered outliers. The outliers are plotted individually using the'+'
symbol. The extremes of the whiskers correspond toq3+ 1.5 × (q3–q1)andq1– 1.5 × (q3–q1).
Box plots include notches for the comparison of the median values. Two medians are significantly different at the 5% significance level if their intervals, represented by notches, do not overlap. This test is different from theF-test that ANOVA performs; however, large differences in the center lines of the boxes correspond to a largeF-statistic value and correspondingly a smallp价值。级距对应的极端q2– 1.57(q3–q1)/sqrt(n) andq2+ 1.57(q3–q1)/sqrt(n), wherenis the number of observations without anyNaN
values.
For more information about box plots, see'Whisker'
and'Notch'
ofboxplot
.
References
[1] Hogg, R. V., and J. Ledolter.Engineering Statistics. New York: MacMillan, 1987.
Version History
See Also
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select:.
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina(Español)
- Canada(English)
- United States(English)
Europe
- Belgium(English)
- Denmark(English)
- Deutschland(Deutsch)
- España(Español)
- Finland(English)
- France(Français)
- Ireland(English)
- Italia(Italiano)
- Luxembourg(English)
- Netherlands(English)
- Norway(English)
- Österreich(Deutsch)
- Portugal(English)
- Sweden(English)
- Switzerland
- United Kingdom(English)