1 Lasso and Elastic net

Two of the state-of-the-art automatic variable selection techniques of predictive modeling , Lasso [1] and Elastic net [2], are provided in the glmnet package. These method are in general better than the stepwise regressions, especially when dealing with large amount of predictor variables.

For a more theoretical review of these techniques refer to section 3.4 - 3.8 of “Elements of Statistical Learning” (Free Online). For more details of the glmnet package you can watch Professor Trevor Hastie’s presentation.

In short, Lasso and Elastic net are solving this optimization problem [2] :

\[\begin{array} {rl} \hat{\beta}= & \underset{\beta}{\text{argmin}}|y-X\beta|^2_2 \\ \text{subject to} & (1-\alpha)|\beta|_1+\alpha|\beta|^2\le t \end{array}\]

Note that when solving the above problem without the constraint we have a familiar least square regression model. When the constraint is added, depending on how large \(t\) is, some of the “unimportant” predictor variables will become 0 and are forced out of the model. Lasso is a special case of Elastic net when \(\alpha = 0\).

Two of the most important tuning parameters are \(t\) and \(\alpha\) in the above equation.

The case for using elastic net is [3]:

Here we demonstrate Lasso on Boston Housing data. For parsimony we are not using training/testing split in this example.

Note: If you have a binary response you can use family= “binomial” option in the glmnet() function.

library(MASS)
data(Boston)
colnames(Boston) 
##  [1] "crim"    "zn"      "indus"   "chas"    "nox"     "rm"      "age"    
##  [8] "dis"     "rad"     "tax"     "ptratio" "black"   "lstat"   "medv"
install.packages('glmnet')
library(glmnet)

glmnet does not take data frame as input, so we have to specify the x matrix and y vector.

lasso_fit <- glmnet(x = as.matrix(Boston[, -c(which(colnames(Boston)=='medv'))]), y = Boston$medv, alpha = 1)
#lambda = 0.5
coef(lasso_fit,s=0.5)
## 14 x 1 sparse Matrix of class "dgCMatrix"
##                        1
## (Intercept) 14.177440452
## crim        -0.013381578
## zn           .          
## indus        .          
## chas         1.565138250
## nox         -0.014613729
## rm           4.237566862
## age          .          
## dis         -0.081500956
## rad          .          
## tax          .          
## ptratio     -0.739162698
## black        0.005955512
## lstat       -0.513806573
#lambda = 1
coef(lasso_fit,s=1)
## 14 x 1 sparse Matrix of class "dgCMatrix"
##                        1
## (Intercept) 15.283205304
## crim         .          
## zn           .          
## indus        .          
## chas         0.074725982
## nox          .          
## rm           3.862575281
## age          .          
## dis          .          
## rad          .          
## tax          .          
## ptratio     -0.620249174
## black        0.001973097
## lstat       -0.496888716

Draw the solution path of the LASSO selection results:

  1. plot.glmnet() (or directly apply plot() on a glmnet object) can draw the solution path for LASSO, but the plot is not well adjusted;
  2. An enhanced version of the above function is the plot_glmnet in plotmo package. It can draw better plot of the solution paths.
plot(lasso_fit, xvar = "lambda", label = TRUE)

library(plotmo) # for plot_glmnet
## Loading required package: Formula
## Loading required package: plotrix
## Loading required package: TeachingDemos
plot_glmnet(lasso_fit, label=TRUE)                             # default colors

plot_glmnet(lasso_fit, label=8, xvar ="norm")                  # label the 5 biggest final coefs

Using 5-fold cross-validation to search optimal lambda:

#use 5-fold cross validation to pick lambda
cv_lasso_fit <- cv.glmnet(x = as.matrix(Boston[, -c(which(colnames(Boston)=='medv'))]), y = Boston$medv, alpha = 1, nfolds = 5)
plot(cv_lasso_fit)

The best \(\lambda\) (or s) is given by:

cv_lasso_fit$lambda.min
## [1] 0.019303

Given a selected s you can use predict() this way to get prediction:

Boston_IS_pred <- predict(lasso_fit, as.matrix(Boston[, -c(which(colnames(Boston)=='medv'))]), s = cv_lasso_fit$lambda.min)

Exercise: For a high dimensional dataset having n<p (n=200, p=500), is it possible to fit a linear regression on that directly? Fit a model with first five variables. How about LASSO variable selection?

[1]: Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267-288. [2]: Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301-320.
[3]: http://www.stanford.edu/~hastie/TALKS/enet_talk.pdf “Regularization and Variable Selection via the Elastic Net”

go to top