Main Content

逐步回归

逐步回归以选择合适的模型

stepwiselmcreates a linear model and automatically adds to or trims the model. To create a small model, start from a constant model. To create a large model, start with a model containing many terms. A large model usually has lower error as measured by the fit to the original data, but might not have any advantage in predicting new data.

stepwiselmcan use all the name-value options fromfitlm,,,,with additional options relating to the starting and bounding models. In particular:

  • 对于一个小型模型,从默认的下限模型开始:'持续的'(a model that has no predictor terms).

  • 默认的上限模型具有线性项和交互项(预测变量对的产物)。下载188bet金宝搏对于也包括平方术语的上边界模型,请设置name-value pair to'quadratic'

比较大小的逐步模型

此示例显示了如何比较模型stepwiselmreturns starting from a constant model and starting from a full interaction model.

Load the卡比格数据并从某些数据创建表。

加载卡比格tbl =表(加速,位移,马力,重量,MPG);

从恒定模型开始创建逐步的里程模型。

mdl1 = stepwiselm(tbl,'持续的',,,,“响应var”,,,,'mpg'
1. Adding Weight, FStat = 888.8507, pValue = 2.9728e-103 2. Adding Horsepower, FStat = 3.8217, pValue = 0.00049608 3. Adding Horsepower:Weight, FStat = 64.8709, pValue = 9.93362e-15
mdl1 = Linear regression model: MPG ~ 1 + Horsepower*Weight Estimated Coefficients: Estimate SE tStat pValue __________ __________ _______ __________ (Intercept) 63.558 2.3429 27.127 1.2343e-91 Horsepower -0.25084 0.027279 -9.1952 2.3226e-18 Weight -0.010772 0.00077381 -13.9215.1372E-36马力:重量5.3554E-05 6.6491E-06 8.0542 9.9336E-15观察次数:392,误差度自由度:388均值均方根错误:3.93 r-Squared:3.93 r-Squared:3.748,调整后R-Squared:0.7446F统计与常数模型:385,p值= 7.26e-116

从完整的交互模型开始创建逐步创建里程模型。

mdl2 = stepwiselm(tbl,'interactions',,,,“响应var”,,,,'mpg'
1. Removing Acceleration:Displacement, FStat = 0.024186, pValue = 0.8765 2. Removing Displacement:Weight, FStat = 0.33103, pValue = 0.56539 3. Removing Acceleration:Horsepower, FStat = 1.7334, pValue = 0.18876 4. Removing Acceleration:Weight, FStat= 0.93269,PVALUE = 0.33477 5.删除马力:重量,FSTAT = 0.64486,PVALUE = 0.42245
mdl2 = Linear regression model: MPG ~ 1 + Acceleration + Weight + Displacement*Horsepower Estimated Coefficients: Estimate SE tStat pValue __________ __________ _______ __________ (Intercept) 61.285 2.8052 21.847 1.8593e-69 Acceleration -0.34401 0.11862 -2.9 0.0039445 Displacement -0.081198 0.010071 -8.0623 9.5014E -15马力-0.24313 0.026068 -9.3265 8.6556e -19重量-0.0014367 0.00084041 -1.7095 0.0888166排量:马力0.00054236 5.7987E efring efring efterialtion:3.7987ee empry efteriment错误:3.84 R平方:0.761,调整后的R平方:0.758 F统计与常数模型:246,P值= 1.32e-117

注意:

  • mdl1有四个系数(估计column), andmdl2has six coefficients.

  • 调整后的R平方mdl1is0.746,比以下mdl2,,,,0.758

以完整的二次模型为上限,从完整的二次模型开始,创建一个逐步创建里程模型:

mdl3 = stepwiselm(tbl,'quadratic',,,,“响应var”,,,,'mpg',,,,'上',,,,'quadratic');
1.删​​除加速度:马力,FSTAT = 0.075209,PVALUE = 0.78405 2.删除加速度:重量,FSTAT = 0.072756,PVALUE = 0.78751 3.删除马力。= 1.194,pvalue = 0.27521 5.卸下位移:重量,fstat = 1.2839,pvalue = 0.25789 6.删除位移^2,fstat = 2.069,pvalue = 0.15114 7.删除马力^2,fstat = 0.744063,pvalue = 0.39,

通过检查其公式来比较三个模型的复杂性。

mdl1.Formula
ans = MPG ~ 1 + Horsepower*Weight
mdl2.formula
ANS = MPG〜1 +加速度 +重量 +位移*马力
mdl3.Formula
ANS = MPG〜1 +重量 +加速度*位移 +位移*马力 +加速度^2

The adjusted r 2 values improve slightly as the models become more complex:

rsquared = [mdl1.rsquared.udjusted,...mdl2.rsquared.udjusted,mdl3.rsquared.Adjusted]
rsquared =1×30.7465 0.7580 0.7599

比较三个模型的残留图。

子图(3,1,1)绘图层(MDL1)子图(3,1,2)绘图水管(MDL2)子图(3,1,3)绘图(MDL3)

Figure contains 3 axes objects. Axes object 1 with title Histogram of residuals contains an object of type patch. Axes object 2 with title Histogram of residuals contains an object of type patch. Axes object 3 with title Histogram of residuals contains an object of type patch.

The models have similar residuals. It is not clear which fits the data better.

有趣的是,更复杂的模型具有较大的残差偏差:

rrange1 = [min(mdl1.Residuals.Raw),max(mdl1.Residuals.Raw)]; Rrange2 = [min(mdl2.Residuals.Raw),max(mdl2.Residuals.Raw)]; Rrange3 = [min(mdl3.Residuals.Raw),max(mdl3.Residuals.Raw)]; Rranges = [Rrange1;Rrange2;Rrange3]
rranges =3×2-10.7725 14.7314 -11.4407 16.7562 -12.2723 16.7927

也可以看看

|||

related Topics