Documentation

robustfit

Robust regression

Syntax

b = robustfit(X,y)
b = robustfit(X,y,wfun,调)
b = robustfit(X,y,wfun,调,const)
[b,stats] = robustfit(...)

Description

b = robustfit(X,y)returns a (p+ 1)-by-1 vectorbof coefficient estimates for a robust multilinear regression of the responses iny在预测X.Xis ann-by-pmatrix ofppredictors at each ofnobservations.yis ann-by-1 vector of observed responses. By default, the algorithm uses iteratively reweighted least squares with a bisquare weighting function.

    Note:By default,robustfitadds a first column of 1s toX, corresponding to a constant term in the model. Do not enter a column of 1s directly intoX. You can change the default behavior ofrobustfitusing the inputconst, below.

robustfittreatsNaNs inXoryas missing values, and removes them.

b = robustfit(X,y,wfun,调)specifies a weighting functionwfun.tuneis a tuning constant that is divided into the residual vector before computing weights.

The weighting functionwfuncan be any one of the following:

Weight Function Equation Default Tuning Constant
'andrews' w = (abs(r) 1.339
'bisquare'(default) w = (abs(r)<1) .* (1 - r.^2).^2 4.685
'cauchy' w = 1 ./ (1 + r.^2) 2.385
'fair' w = 1 ./ (1 + abs(r)) 1.400
'huber' w = 1 ./ max(1, abs(r)) 1.345
'logistic' w = tanh(r) ./ r 1.205
'ols' Ordinary least squares (no weighting function) None
'talwar' w = 1 * (abs(r)<1) 2.795
'welsch' w = exp(-(r.^2)) 2.985

Iftuneis unspecified, the default value in the table is used. Default tuning constants give coefficient estimates that are approximately 95% as statistically efficient as the ordinary least-squares estimates, provided the response has a normal distribution with no outliers. Decreasing the tuning constant increases the downweight assigned to large residuals; increasing the tuning constant decreases the downweight assigned to large residuals.

The valuerin the weight functions is

r = resid/(tune*s*sqrt(1-h))

whereresidis the vector of residuals from the previous iteration,his the vector of leverage values from a least-squares fit, andsis an estimate of the standard deviation of the error term given by

s = MAD/0.6745

HereMADis the median absolute deviation of the residuals from their median. The constant 0.6745 makes the estimate unbiased for the normal distribution. If there arepcolumns inX, the smallestpabsolute deviations are excluded when computing the median.

You can write your own weight function. The function must take a vector of scaled residuals as input and produce a vector of weights as output. In this case,wfunis specified using a function handle@(as in@myfun), and the inputtuneis required.

b = robustfit(X,y,wfun,调,const)controls whether or not the model will include a constant term.constis'on'to include the constant term (the default), or'off'to omit it. Whenconstis'on',robustfitadds a first column of 1s toXandbbecomes a (p+ 1)-by-1 vector . Whenconstis'off',robustfitdoes not alterX, thenbis ap-by-1 vector.

[b,stats] = robustfit(...)returns the structurestats, whose fields contain diagnostic statistics from the regression. The fields ofstatsare:

  • ols_s— Sigma estimate (RMSE) from ordinary least squares

  • robust_s— Robust estimate of sigma

  • mad_s— Estimate of sigma computed using the median absolute deviation of the residuals from their median; used for scaling residuals during iterative fitting

  • s——最终估计的σ,较大的robust_sand a weighted average ofols_sandrobust_s

  • resid— Residual

  • rstud— Studentized residual (seeregressfor more information)

  • se— Standard error of coefficient estimates

  • covb— Estimated covariance matrix for coefficient estimates

  • coeffcorr— Estimated correlation of coefficient estimates

  • t— Ratio ofbtose

  • pp-values fort

  • w— Vector of weights for robust fit

  • RRfactor inQRdecomposition ofX

  • dfe— Degrees of freedom for error

  • h— Vector of leverage values for least-squares fit

Therobustfitfunction estimates the variance-covariance matrix of the coefficient estimates usinginv(X'*X)*stats.s^2. Standard errors and correlations are derived from this estimate.

Examples

崩溃all

Generate data with the trendy= 10 - 2*x, then change one value to simulate an outlier.

x =(1:10)”;rngdefault;% For reproducibilityy = 10 - 2*x + randn(10,1); y(10) = 0;

Fit a straight line using ordinary least squares regression.

bls = regress(y,[ones(10,1) x])
bls = 7.8518 -1.3644

Now use robust regression to estimate a straight-line fit.

brob = robustfit(x,y)
brob = 8.4504 -1.5278

Create scatter plot of the data together with the fits.

scatter(x,y,'filled'); gridon;持有onplot(x,bls(1)+bls(2)*x,'r','LineWidth',2); plot(x,brob(1)+brob(2)*x,'g','LineWidth',2) legend('Data','Ordinary Least Squares','Robust Regression')

The robust fit is less influenced by the outlier than the least-squares fit.

References

[1] DuMouchel, W. H., and F. L. O'Brien. "Integrating a Robust Option into a Multiple Regression Computing Environment."Computer Science and Statistics:Proceedings of the 21st Symposium on the Interface. Alexandria, VA: American Statistical Association, 1989.

[2] Holland, P. W., and R. E. Welsch. "Robust Regression Using Iteratively Reweighted Least-Squares."Communications in Statistics: Theory and Methods,A6, 1977, pp. 813–827.

[3] Huber, P. J.Robust Statistics. Hoboken, NJ: John Wiley & Sons, Inc., 1981.

[4] Street, J. O., R. J. Carroll, and D. Ruppert. "A Note on Computing Robust Regression Estimates via Iteratively Reweighted Least Squares."The American Statistician. Vol. 42, 1988, pp. 152–154.

See Also

|

Introduced before R2006a

Was this topic helpful?