Lasso Regularization of Generalized Linear Models

What is Generalized Linear Model Lasso Regularization?

Lasso is a regularization technique. Uselassoglmto:

Reduce the number of predictors in a generalized linear model.
Identify important predictors.
Select among redundant predictors.
与普通最小二乘相比，产生的收缩估计值可能低于预测误差。

Elastic net is a related technique. Use it when you have several highly correlated variables.lassoglmprovides elastic net regularization when you set theAlpha名称值对严格介于0和1。

For details about lasso and elastic net computations and algorithms, seeGeneralized Linear Model Lasso and Elastic Net。有关广义线性模型的讨论，请参见What Are Generalized Linear Models?。

Generalized Linear Model Lasso and Elastic Net

Overview of Lasso and Elastic Net

Lasso是用于估计广义线性模型的正则化技术。拉索包括约束估计系数大小的罚款项。因此，它类似于Ridge Regression。Lasso is a收缩估计器：它产生的系数估计值是偏差为小的。然而，将套索估计器应用于新数据时的错误可能比普通的最大似然估计器更小。

Unlike ridge regression, as the penalty term increases, the lasso technique sets more coefficients to zero. This means that the lasso estimator is a smaller model, with fewer predictors. As such, lasso is an alternative tostepwise regression和other model selection and dimensionality reduction techniques.

Elastic netis a related technique. Elastic net is akin to a hybrid of ridge regression and lasso regularization. Like lasso, elastic net can generate reduced models by generating zero-valued coefficients. Empirical studies suggest that the elastic net technique can outperform lasso on data with highly correlated predictors.

Definition of Lasso for Generalized Linear Models

For a nonnegative value ofλ,lassoglmsolves the problem

$\min_{β_{0}, β} (\frac{1}{N} 偏差 (β_{0}, β) + λ \sum_{j = 1}^{p} | β_{j} |) 。$

The function Deviance in this equation is the deviance of the model fit to the responses using the interceptβ₀和the predictor coefficientsβ。偏差的公式取决于distrparameter you supply tolassoglm。Minimizing theλ- 二元偏差等效于最大化λ-penalized loglikelihood.
Nis the number of observations.
λis a nonnegative regularization parameter corresponding to one value ofLambda。
The parametersβ₀和βare a scalar and a vector of lengthp, respectively.

作为λincreases, the number of nonzero components ofβdecreases.

The lasso problem involves theL¹norm ofβ, as contrasted with the elastic net algorithm.

通用线性模型的弹性网的定义

Forαstrictly between 0 and 1, and nonnegativeλ，弹性网解决了问题

$\min_{β_{0}, β} (\frac{1}{N} 偏差 (β_{0}, β) + λ P_{α} (β)),$

where

$P_{α} (β) = \frac{(1 - α)}{2} {‖ β ‖}_{2}^{2} + α {‖ β ‖}_{1} = \sum_{j = 1}^{p} (\frac{(1 - α)}{2} β_{j}^{2} + α | β_{j} |) 。$

Elastic net is the same as lasso whenα= 1. For other values ofα，罚款P_α(β) interpolates between theL¹norm ofβ和the squaredL²norm ofβ。作为αshrinks toward 0, elastic net approachesridgeregression.

参考

[1] Tibshirani, R.通过拉索回归收缩和选择。皇家统计学会杂志》的系列B, Vol. 58, No. 1, pp. 267–288, 1996.

[2] Zou, H. and T. Hastie.Regularization and Variable Selection via the Elastic Net.皇家统计学会杂志》的系列B, Vol. 67, No. 2, pp. 301–320, 2005.

[3] Friedman, J., R. Tibshirani, and T. Hastie.Regularization Paths for Generalized Linear Models via Coordinate Descent.Journal of Statistical Software, Vol. 33, No. 1, 2010.https://www.jstatsoft.org/v33/i01

[4] Hastie, T., R. Tibshirani, and J. Friedman.统计学习的要素，2nd edition. Springer, New York, 2008.

[5] McCullagh, P., and J. A. Nelder.Generalized Linear Models,2nd edition. Chapman & Hall/CRC Press, 1989.