Lasso Regularization of Generalized Linear Models
What is Generalized Linear Model Lasso Regularization?
Lasso is a regularization technique. Uselassoglm
to:
Reduce the number of predictors in a generalized linear model.
Identify important predictors.
Select among redundant predictors.
Produce shrinkage estimates with potentially lower predictive errors than ordinary least squares.
Elastic net is a related technique. Use it when you have several highly correlated variables.lassoglm
provides elastic net regularization when you set theAlpha
name-value pair to a number strictly between0
and1
.
For details about lasso and elastic net computations and algorithms, seeGeneralized Linear Model Lasso and Elastic Net. For a discussion of generalized linear models, seeWhat Are Generalized Linear Models?.
Generalized Linear Model Lasso and Elastic Net
Overview of Lasso and Elastic Net
Lassois a regularization technique for estimating generalized linear models. Lasso includes a penalty term that constrains the size of the estimated coefficients. Therefore, it resemblesRidge Regression. Lasso is ashrinkage estimator: it generates coefficient estimates that are biased to be small. Nevertheless, a lasso estimator can have smaller error than an ordinary maximum likelihood estimator when you apply it to new data.
Unlike ridge regression, as the penalty term increases, the lasso technique sets more coefficients to zero. This means that the lasso estimator is a smaller model, with fewer predictors. As such, lasso is an alternative tostepwise regressionand other model selection and dimensionality reduction techniques.
Elastic netis a related technique. Elastic net is akin to a hybrid of ridge regression and lasso regularization. Like lasso, elastic net can generate reduced models by generating zero-valued coefficients. Empirical studies suggest that the elastic net technique can outperform lasso on data with highly correlated predictors.
Definition of Lasso for Generalized Linear Models
For a nonnegative value ofλ,lassoglm
solves the problem
The function Deviance in this equation is the deviance of the model fit to the responses using the interceptβ0and the predictor coefficientsβ. The formula for Deviance depends on the
distr
parameter you supply tolassoglm
. Minimizing theλ-penalized deviance is equivalent to maximizing theλ-penalized loglikelihood.Nis the number of observations.
λis a nonnegative regularization parameter corresponding to one value of
Lambda
.The parametersβ0andβare a scalar and a vector of lengthp, respectively.
Asλincreases, the number of nonzero components ofβdecreases.
The lasso problem involves theL1norm ofβ, as contrasted with the elastic net algorithm.
Definition of Elastic Net for Generalized Linear Models
Forαstrictly between 0 and 1, and nonnegativeλ,弹性网络解决问题
where
Elastic net is the same as lasso whenα= 1. For other values ofα, the penalty termPα(β) interpolates between theL1norm ofβand the squaredL2norm ofβ. Asαshrinks toward 0, elastic net approachesridge
regression.
References
[1] Tibshirani, R.Regression Shrinkage and Selection via the Lasso.皇家统计学会杂志》的系列B, Vol. 58, No. 1, pp. 267–288, 1996.
[2] Zou, H. and T. Hastie.Regularization and Variable Selection via the Elastic Net.皇家统计学会杂志》的系列B, Vol. 67, No. 2, pp. 301–320, 2005.
[3] Friedman, J., R. Tibshirani, and T. Hastie.Regularization Paths for Generalized Linear Models via Coordinate Descent.Journal of Statistical Software, Vol. 33, No. 1, 2010.https://www.jstatsoft.org/v33/i01
[4] Hastie, T., R. Tibshirani, and J. Friedman.The Elements of Statistical Learning,2nd edition. Springer, New York, 2008.
[5] McCullagh, P., and J. A. Nelder.Generalized Linear Models,2nd edition. Chapman & Hall/CRC Press, 1989.