逐步回归
逐步回归以选择合适的模型
stepwiselm
creates a linear model and automatically adds to or trims the model. To create a small model, start from a constant model. To create a large model, start with a model containing many terms. A large model usually has lower error as measured by the fit to the original data, but might not have any advantage in predicting new data.
stepwiselm
can use all the name-value options fromfitlm
,,,,with additional options relating to the starting and bounding models. In particular:
对于一个小型模型,从默认的下限模型开始:
'持续的'
(a model that has no predictor terms).默认的上限模型具有线性项和交互项(预测变量对的产物)。下载188bet金宝搏对于也包括平方术语的上边界模型,请设置
上
name-value pair to'quadratic'
。
比较大小的逐步模型
此示例显示了如何比较模型stepwiselm
returns starting from a constant model and starting from a full interaction model.
Load the卡比格
数据并从某些数据创建表。
加载卡比格tbl =表(加速,位移,马力,重量,MPG);
从恒定模型开始创建逐步的里程模型。
mdl1 = stepwiselm(tbl,'持续的',,,,“响应var”,,,,'mpg')
1. Adding Weight, FStat = 888.8507, pValue = 2.9728e-103 2. Adding Horsepower, FStat = 3.8217, pValue = 0.00049608 3. Adding Horsepower:Weight, FStat = 64.8709, pValue = 9.93362e-15
mdl1 = Linear regression model: MPG ~ 1 + Horsepower*Weight Estimated Coefficients: Estimate SE tStat pValue __________ __________ _______ __________ (Intercept) 63.558 2.3429 27.127 1.2343e-91 Horsepower -0.25084 0.027279 -9.1952 2.3226e-18 Weight -0.010772 0.00077381 -13.9215.1372E-36马力:重量5.3554E-05 6.6491E-06 8.0542 9.9336E-15观察次数:392,误差度自由度:388均值均方根错误:3.93 r-Squared:3.93 r-Squared:3.748,调整后R-Squared:0.7446F统计与常数模型:385,p值= 7.26e-116
从完整的交互模型开始创建逐步创建里程模型。
mdl2 = stepwiselm(tbl,'interactions',,,,“响应var”,,,,'mpg')
1. Removing Acceleration:Displacement, FStat = 0.024186, pValue = 0.8765 2. Removing Displacement:Weight, FStat = 0.33103, pValue = 0.56539 3. Removing Acceleration:Horsepower, FStat = 1.7334, pValue = 0.18876 4. Removing Acceleration:Weight, FStat= 0.93269,PVALUE = 0.33477 5.删除马力:重量,FSTAT = 0.64486,PVALUE = 0.42245
mdl2 = Linear regression model: MPG ~ 1 + Acceleration + Weight + Displacement*Horsepower Estimated Coefficients: Estimate SE tStat pValue __________ __________ _______ __________ (Intercept) 61.285 2.8052 21.847 1.8593e-69 Acceleration -0.34401 0.11862 -2.9 0.0039445 Displacement -0.081198 0.010071 -8.0623 9.5014E -15马力-0.24313 0.026068 -9.3265 8.6556e -19重量-0.0014367 0.00084041 -1.7095 0.0888166排量:马力0.00054236 5.7987E efring efring efterialtion:3.7987ee empry efteriment错误:3.84 R平方:0.761,调整后的R平方:0.758 F统计与常数模型:246,P值= 1.32e-117
注意:
mdl1
有四个系数(估计
column), andmdl2
has six coefficients.调整后的R平方
mdl1
is0.746
,比以下mdl2
,,,,0.758
。
以完整的二次模型为上限,从完整的二次模型开始,创建一个逐步创建里程模型:
mdl3 = stepwiselm(tbl,'quadratic',,,,“响应var”,,,,'mpg',,,,'上',,,,'quadratic');
1.删除加速度:马力,FSTAT = 0.075209,PVALUE = 0.78405 2.删除加速度:重量,FSTAT = 0.072756,PVALUE = 0.78751 3.删除马力。= 1.194,pvalue = 0.27521 5.卸下位移:重量,fstat = 1.2839,pvalue = 0.25789 6.删除位移^2,fstat = 2.069,pvalue = 0.15114 7.删除马力^2,fstat = 0.744063,pvalue = 0.39,
通过检查其公式来比较三个模型的复杂性。
mdl1.Formula
ans = MPG ~ 1 + Horsepower*Weight
mdl2.formula
ANS = MPG〜1 +加速度 +重量 +位移*马力
mdl3.Formula
ANS = MPG〜1 +重量 +加速度*位移 +位移*马力 +加速度^2
The adjusted values improve slightly as the models become more complex:
rsquared = [mdl1.rsquared.udjusted,...mdl2.rsquared.udjusted,mdl3.rsquared.Adjusted]
rsquared =1×30.7465 0.7580 0.7599
比较三个模型的残留图。
子图(3,1,1)绘图层(MDL1)子图(3,1,2)绘图水管(MDL2)子图(3,1,3)绘图(MDL3)
The models have similar residuals. It is not clear which fits the data better.
有趣的是,更复杂的模型具有较大的残差偏差:
rrange1 = [min(mdl1.Residuals.Raw),max(mdl1.Residuals.Raw)]; Rrange2 = [min(mdl2.Residuals.Raw),max(mdl2.Residuals.Raw)]; Rrange3 = [min(mdl3.Residuals.Raw),max(mdl3.Residuals.Raw)]; Rranges = [Rrange1;Rrange2;Rrange3]
rranges =3×2-10.7725 14.7314 -11.4407 16.7562 -12.2723 16.7927
也可以看看
fitlm
|plotResiduals
|stepwiselm
|LinearModel