kstest
One-sample Kolmogorov-Smirnov test
Description
returns a test decision for the null hypothesis that the data in vectorh
= kstest(x
)x
comes from a standard normal distribution, against the alternative that it does not come from such a distribution, using theone-sample Kolmogorov-Smirnov test. The resulth
is1
if the test rejects the null hypothesis at the 5% significance level, or0
otherwise.
returns a test decision for the one-sample Kolmogorov-Smirnov test with additional options specified by one or more name-value pair arguments. For example, you can test for a distribution other than standard normal, change the significance level, or conduct a one-sided test.h
= kstest(x
,Name,Value
)
Examples
Test for Standard Normal Distribution
Perform the one-sample Kolmogorov-Smirnov test by usingkstest
. Confirm the test decision by visually comparing the empirical cumulative distribution function (cdf) to the standard normal cdf.
Load theexamgrades
data set. Create a vector containing the first column of the exam grade data.
loadexamgradestest1 = grades(:,1);
Test the null hypothesis that the data comes from a normal distribution with a mean of 75 and a standard deviation of 10. Use these parameters to center and scale each element of the data vector, becausekstest
tests for a standard normal distribution by default.
x = (test1-75)/10; h = kstest(x)
h =logical0
The returned value ofh = 0
indicates thatkstest
fails to reject the null hypothesis at the default 5% significance level.
Plot the empirical cdf and the standard normal cdf for a visual comparison.
cdfplot(x) holdonx_values = linspace(min(x),max(x)); plot(x_values,normcdf(x_values,0,1),'r-') legend('Empirical CDF','Standard Normal CDF','Location','best')
图中显示empiri之间的相似性cal cdf of the centered and scaled data vector and the cdf of the standard normal distribution.
Specify the Hypothesized Distribution Using a Two-Column Matrix
Load the sample data. Create a vector containing the first column of the students’ exam grades data.
loadexamgrades; x = grades(:,1);
Specify the hypothesized distribution as a two-column matrix. Column 1 contains the data vectorx
. Column 2 contains cdf values evaluated at each value inx
for a hypothesized Student’s
distribution with a location parameter of 75, a scale parameter of 10, and one degree of freedom.
test_cdf = [x,cdf('tlocationscale',x,75,10,1)];
Test if the data are from the hypothesized distribution.
h = kstest(x,'CDF',test_cdf)
h =logical1
The returned value ofh = 1
indicates thatkstest
rejects the null hypothesis at the default 5% significance level.
Specify the Hypothesized Distribution Using a Probability Distribution Object
Load the sample data. Create a vector containing the first column of the students’ exam grades data.
loadexamgrades; x = grades(:,1);
Create a probability distribution object to test if the data comes from a Student’s distribution with a location parameter of 75, a scale parameter of 10, and one degree of freedom.
test_cdf = makedist('tlocationscale','mu',75,'sigma',10,'nu',1);
Test the null hypothesis that the data comes from the hypothesized distribution.
h = kstest(x,'CDF',test_cdf)
h =logical1
The returned value ofh = 1
indicates thatkstest
rejects the null hypothesis at the default 5% significance level.
Test the Hypothesis at Different Significance Levels
Load the sample data. Create a vector containing the first column of the students’ exam grades.
loadexamgrades; x = grades(:,1);
Create a probability distribution object to test if the data comes from a Student’s distribution with a location parameter of 75, a scale parameter of 10, and one degree of freedom.
test_cdf = makedist('tlocationscale','mu',75,'sigma',10,'nu',1);
Test the null hypothesis that data comes from the hypothesized distribution at the 1% significance level.
[h,p] = kstest(x,'CDF',test_cdf,'Alpha',0.01)
h =logical1
p = 0.0021
The returned value ofh = 1
indicates thatkstest
拒绝零假设1%的意义level.
Conduct a One-Sided Hypothesis Test
Load the sample data. Create a vector containing the third column of the stock return data matrix.
loadstockreturns; x = stocks(:,3);
Test the null hypothesis that the data comes from a standard normal distribution, against the alternative hypothesis that the population cdf of the data is larger than the standard normal cdf.
[h,p,k,c] = kstest(x,'Tail','larger')
h =logical1
p = 5.0854e-05
k = 0.2197
c = 0.1207
The returned value ofh = 1
indicates thatkstest
rejects the null hypothesis in favor of the alternative hypothesis at the default 5% significance level.
Plot the empirical cdf and the standard normal cdf for a visual comparison.
[f,x_values] = ecdf(x); J = plot(x_values,f); holdon; K = plot(x_values,normcdf(x_values),'r--'); set(J,'LineWidth',2); set(K,'LineWidth',2); legend([J K],'Empirical CDF','Standard Normal CDF','Location','SE');
The plot shows the difference between the empirical cdf of the data vectorx
and the cdf of the standard normal distribution.
Input Arguments
x
—Sample data
vector
Sample data, specified as a vector.
Data Types:single
|double
Name-Value Arguments
Specify optional pairs of arguments asName1=Value1,...,NameN=ValueN
, whereName
is the argument name andValue
is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.
Before R2021a, use commas to separate each name and value, and encloseName
in quotes.
Example:'Tail','larger','Alpha',0.01
specifies a test using the alternative hypothesis that the cdf of the population from which the sample data is drawn is greater than the cdf of the hypothesized distribution, conducted at the 1% significance level.
Alpha
—Significance level
0.05
(default) |scalar value in the range (0,1)
Significance level of the hypothesis test, specified as the comma-separated pair consisting of'Alpha'
and a scalar value in the range (0,1).
Example:'Alpha',0.01
Data Types:single
|double
CDF
—cdf of hypothesized continuous distribution
matrix|probability distribution object
cdf of hypothesized continuous distribution, specified the comma-separated pair consisting of'CDF'
and either a two-column matrix or a continuous probability distribution object. WhenCDF
is a matrix, column 1 contains a set of possiblexvalues, and column 2 contains the corresponding hypothesized cumulative distribution function valuesG(x). The calculation is most efficient ifCDF
这样第一列包含指定值s in the data vectorx
. If there are values inx
not found in column 1 ofCDF
,kstest
approximatesG(x) by interpolation. All values inx
must lie in the interval between the smallest and largest values in the first column ofCDF
. By default,kstest
tests for a standard normal distribution.
Theone-sample Kolmogorov-Smirnov testis only valid for continuous cumulative distribution functions, and requiresCDF
to be predetermined. The result is not accurate ifCDF
is estimated from the data. To testx
against the normal, lognormal, extreme value, Weibull, or exponential distribution without specifying distribution parameters, uselillietest
instead.
Data Types:single
|double
Tail
—Type of alternative hypothesis
'unequal'
(default) |'larger'
|'smaller'
Type of alternative hypothesis to evaluate, specified as the comma-separated pair consisting of'Tail'
and one of the following.
'unequal' |
Test the alternative hypothesis that the cdf of the population from whichx is drawn is not equal to the cdf of the hypothesized distribution. |
'larger' |
Test the alternative hypothesis that the cdf of the population from whichx is drawn is greater than the cdf of the hypothesized distribution. |
'smaller' |
Test the alternative hypothesis that the cdf of the population from whichx is drawn is less than the cdf of the hypothesized distribution. |
If the values in the data vectorx
tend to be larger than expected from the hypothesized distribution, the empirical distribution function ofx
tends to be smaller, and vice versa.
Example:'Tail','larger'
Output Arguments
h
— Hypothesis test result
1
|0
Hypothesis test result, returned as a logical value.
If
h
= 1
, this indicates the rejection of the null hypothesis at theAlpha
significance level.If
h
= 0
, this indicates a failure to reject the null hypothesis at theAlpha
significance level.
p
—p-value
scalar value in the range [0,1]
p-value of the test, returned as a scalar value in the range [0,1].p
is the probability of observing a test statistic as extreme as, or more extreme than, the observed value under the null hypothesis. Small values ofp
cast doubt on the validity of the null hypothesis.
ksstat
— Test statistic
nonnegative scalar value
Test statistic of the hypothesis test, returned as a nonnegative scalar value.
cv
— Critical value
nonnegative scalar value
Critical value, returned as a nonnegative scalar value.
More About
One-Sample Kolmogorov-Smirnov Test
The one-sample Kolmogorov-Smirnov test is a nonparametric test of the null hypothesis that the population cdf of the data is equal to the hypothesized cdf.
The two-sided test for “unequal” cdf functions tests the null hypothesis against the alternative that the population cdf of the data is not equal to the hypothesized cdf. The test statistic is the maximum absolute difference between the empirical cdf calculated fromxand the hypothesized cdf:
where is the empirical cdf and is the cdf of the hypothesized distribution.
The one-sided test for a “larger” cdf function tests the null hypothesis against the alternative that the population cdf of the data is greater than the hypothesized cdf. The test statistic is the maximum amount by which the empirical cdf calculated fromxexceeds the hypothesized cdf:
The one-sided test for a “smaller” cdf function tests the null hypothesis against the alternative that the population cdf of the data is less than the hypothesized cdf. The test statistic is the maximum amount by which the hypothesized cdf exceeds the empirical cdf calculated fromx:
kstest
computes the critical valuecv
using an approximate formula or by interpolation in a table. The formula and table cover the range0.01
≤alpha
≤0.2
for two-sided tests and0.005
≤alpha
≤0.1
for one-sided tests.cv
is returned asNaN
ifalpha
is outside this range.
Algorithms
kstest
decides to reject the null hypothesis by comparing thep-valuep
with the significance levelAlpha
, not by comparing the test statisticksstat
with the critical valuecv
. Sincecv
is approximate, comparingksstat
withcv
occasionally leads to a different conclusion than comparingp
withAlpha
.
References
[1] Massey, F. J. “The Kolmogorov-Smirnov Test for Goodness of Fit.”Journal of the American Statistical Association. Vol. 46, No. 253, 1951, pp. 68–78.
[2] Miller, L. H. “Table of Percentage Points of Kolmogorov Statistics.”Journal of the American Statistical Association. Vol. 51, No. 273, 1956, pp. 111–121.
[3] Marsaglia, G., W. Tsang, and J. Wang. “Evaluating Kolmogorov’s Distribution.”Journal of Statistical Software. Vol. 8, Issue 18, 2003.
Version History
Introduced before R2006a
See Also
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select:.
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina(Español)
- Canada(English)
- United States(English)
Europe
- Belgium(English)
- Denmark(English)
- Deutschland(Deutsch)
- España(Español)
- Finland(English)
- France(Français)
- Ireland(English)
- Italia(Italiano)
- Luxembourg(English)
- Netherlands(English)
- Norway(English)
- Österreich(Deutsch)
- Portugal(English)
- Sweden(English)
- Switzerland
- United Kingdom(English)