Interpret Linear Regression Results

Open Live Script

此示例显示了如何显示和解释线性回归输出统计信息。

Fit Linear Regression Model

加载carsmalldata set, a matrix input data set.

加载carsmallx = [重量，马力，加速];

Fit a linear regression model by usingfitlm。

lm = fitlm(X,MPG)

lm = Linear regression model: y ~ 1 + x1 + x2 + x3 Estimated Coefficients: Estimate SE tStat pValue __________ _________ _________ __________ (Intercept) 47.977 3.8785 12.37 4.8957e-21 x1 -0.0065416 0.0011274 -5.8023 9.8742e-08 x2 -0.042943 0.024313 -1.7663 0.08078 x3 -0.011583 0.19333 -0.059913 0.95236 Number of observations: 93, Error degrees of freedom: 89 Root Mean Squared Error: 4.09 R-squared: 0.752, Adjusted R-Squared: 0.744 F-statistic vs. constant model: 90, p-value = 7.38e-27

模型显示包括型号公式，估计系数和模型摘要统计信息。

该model formula in the display,Y〜1 + x1 + x2 + x3, corresponds to $y = β_{0} + β_{1} X_{1} + β_{2} X_{2} + β_{3} X_{3} + ϵ$ 。

该model display shows the estimated coefficient information, which is stored in theCoefficients属性。显示Coefficients属性。

lm.Coefficients

ans =.4×4 table估算率SE TSTAT PVALUE __________ _________ ____________________________________________0.0065416 0.023 0.0.080330.024313 0.01933 -1.0.01133 -20.024313 0.0233330.03958/03023 0.0.04313 0.023 0.0.04313 0.023 0.0.042943

该Coefficient属性包含这些列：

Estimate— Coefficient estimates for each corresponding term in the model. For example, the estimate for the constant term (截距) is 47.977.
SE- 系数的标准误差。
tStat—t-statistic for each coefficient to test the null hypothesis that the corresponding coefficient is zero against the alternative that it is different from zero, given the other predictors in the model. Note thattStat = Estimate/SE。例如，t截距的间距为47.977 / 3.8785 = 12.37。
pvalue.—p-value for thet对相应系数等于零的假设测试。例如，p- 左边的价值t-statistic forX2大于0.05，因此该术语在5％的意义水平上没有显着鉴于模型中的其他术语。

模型的摘要统计数据是：

Number of observations- 没有任何行的行数南values. For example,Number of observationsis 93 because theMPG.data vector has six南values and theHorsepowerdata vector has one南value for a different observation, where the number of rows inXandMPG.是100。
Error degrees of freedom—n–p，在哪里n是观察人数，和p是模型中的系数数量，包括截距。例如，该模型有四个预测因子，所以Error degrees of freedom是93 - 4 = 89。
根均匀误差- 平均方形误差的平方根，估计错误分布的标准偏差。
r-平方and调整的R角——确定系数和广告justed coefficient of determination, respectively. For example, ther-平方value suggests that the model explains approximately 75% of the variability in the response variableMPG.。
F统计与常量模型- 测试统计F-test on the regression model, which tests whether the model fits significantly better than a degenerate model consisting of only a constant term.
p-value—p-value for theF-test on the model. For example, the model is significant with ap-value of 7.3816e-27.

ANOVA

对模型进行方差分析（ANOVA）。

Anova.(lm,'概要')

ans =.3×5 tableSUMSQ DF均衡Q _____________________________________总共6004.8 92 65.269型号4516 3 1505.3 89.987 7.3816E-27剩余1488.8 89 16.728

ThisAnova.display shows the following.

SUMSQ.- 回归模型的平方和，模型,error term,剩余的以及总数，Total。
DF— Degrees of freedom for each term. Degrees of freedom is $n - 1$ 总计， $p - 1$ for the model, and $n - p$ 对于错误项，在哪里 $n$ 是观察人数，和 $p$ 是模型中的系数数量，包括截距。For example,MPG.data vector has six南values and one of the data vectors,Horsepower, has one南不同观察的值，因此自由度总量为93 - 1 = 92.模型中有四个系数，因此模型DFis 4 – 1 = 3, and theDF错误项是93 - 4 = 89。
MeanSq— Mean squared error for each term. Note that均衡Q = SUMSQ / DF。例如，误差项的平均平方误差为1488.8 / 89 = 16.728。这个值的平方根是root mean squared error在线性回归显示屏中，或4.09。
F—F- 职位值，与之相同F统计与常量模型在线性回归显示。在此示例中，它是89.987，并且在线性回归显示F-statistic value is rounded up to 90.
pvalue.—p-value for theF-test on the model. In this example, it is 7.3816e-27.

If there are higher-order terms in the regression model,Anova.分区模型SUMSQ.进入由高阶项和其余的术语解释的部分。相应的F-statistics are for testing the significance of the linear terms and higher-order terms as separate groups.

如果数据包括复制，或者在相同的预测值值下的多个测量，则Anova.partitions the errorSUMSQ.into the part for the replicates and the rest. The correspondingF-statistic is for testing the lack-of-fit by comparing the model residuals with the model-free variance estimate computed on the replicates.

Decompose ANOVA table for model terms.

Anova.(lm)

ans =.4×5表SumSq DF MeanSq F pValue  ________ __ ________ _________ __________ x1 563.18 1 563.18 33.667 9.8742e-08 x2 52.187 1 52.187 3.1197 0.08078 x3 0.060046 1 0.060046 0.0035895 0.95236 Error 1488.8 89 16.728

ThisAnova.显示屏显示以下内容：

First column — Terms included in the model.
SUMSQ.— Sum of squared error for each term except for the constant.
DF— Degrees of freedom. In this example,DF模型中的每个术语为1 $n - p$ 对于错误项，在哪里 $n$ 是观察人数，和 $p$ 是模型中的系数数量，包括截距。例如，DF对于此模型中的错误项为93 - 4 = 89.如果模型中的任何变量是分类变量，则DFfor that variable is the number of indicator variables created for its categories (number of categories – 1).
MeanSq— Mean squared error for each term. Note that均衡Q = SUMSQ / DF。例如，误差项的平均平方误差为1488.8 / 89 = 16.728。
F—F-values for each coefficient. TheF-Value是每个术语和均方误差的平均平均值的比率，即F = MeanSq(xi)/MeanSq(Error)。EachF-statistic has anFdistribution, with the numerator degrees of freedom,DF相应术语的价值，以及分母自由度， $n - p$ 。 $n$ 是观察人数，和 $p$ 是模型中的系数数。在这个例子中，每个F-statistic has an $F_{(1, 89)}$ 分配。
pvalue.—p- 对于线性模型中对应术语系数的每个假设检验的value。例如，p-value for theF-statistic coefficient ofX2是0.08078，在模型中的其他术语中，在5％的意义水平下不显着。

Coefficient Confidence Intervals

显示系数置信区间。

COEFCI（LM）

ans =.4×240.2702 55.6833 -0.0088 -0.0043 -0.0913 0.0054 -0.3957 0.3726

该values in each row are the lower and upper confidence limits, respectively, for the default 95% confidence intervals for the coefficients. For example, the first row shows the lower and upper limits, 40.2702 and 55.6833, for the intercept, $β_{0}$ 。Likewise, the second row shows the limits for $β_{1}$ and so on. Confidence intervals provide a measure of precision for linear regression coefficient estimates. A $100 (1 - α) %$ confidence interval gives the range the corresponding regression coefficient will be in with $100 (1 - α) %$ 置信度。

You can also change the confidence level. Find the 99% confidence intervals for the coefficients.

coefCI(lm,0.01)

ans =.4×237.7677 58.1858 -0.0095 -0.0036 -0.1069 0.0211 -0.5205 0.4973

Hypothesis Test on Coefficients

Test the null hypothesis that all predictor variable coefficients are equal to zero versus the alternate hypothesis that at least one of them is different from zero.

[p，f，d] = colealtest（lm）

p = 7.3816e-27

F = 89.9874

d = 3.

Here,coefTestperforms anF-test for the hypothesis that all regression coefficients (except for the intercept) are zero versus at least one differs from zero, which essentially is the hypothesis on the model. It returns $p$ ,p-value,F,F-statistic, andd，分子自由度。该F- 术和p-value are the same as the ones in the linear regression display andAnova.for the model. The degrees of freedom is 4 – 1 = 3 because there are four predictors (including the intercept) in the model.

Now, perform a hypothesis test on the coefficients of the first and second predictor variables.

H = [0 1 0 0; 0 0 1 0]; [p,F,d] = coefTest(lm,H)

P = 5.1702E-23

F = 96.4873

d = 2.

该numerator degrees of freedom is the number of coefficients tested, which is 2 in this example. The results indicate that at least one of $β_{2}$ and $β_{3}$ differs from zero.

Interpret Linear Regression Results

Fit Linear Regression Model

ANOVA

Coefficient Confidence Intervals

Hypothesis Test on Coefficients

See Also

Related Examples

更多About

统计和机器学习工具箱Documentation

金宝app

Mastering Machine Learning: A Step-by-Step Guide with MATLAB