library(knitr) library(tidyverse) opts_chunk$set(echo=FALSE, fig.align='center', fig.width=8, fig.height=8, cache=TRUE, autodep=TRUE, cache.comments=FALSE, message=FALSE, warning=FALSE)
Your goal is to use this data to predict economic mobility. Note that there are generally more observations than predictors.
mob = read.csv('mobility.csv')
Using glmnet
, estimate 4 models: the linear model, ridge regression, the lasso, and the elastic net ($\alpha=.5$).
Plot the CV curves for each of the three regularized models (easy).
Use lambda.min
to get a particular model for each of the regularized ones.
Plot the coefficients for each of the 4 models on one figure. What do you notice? Which features are most important?
library(glmnet) linmod = lm(Mobility~.-ID-Name-State, data=mob, y=TRUE) X = model.matrix(linmod)[,-1] y = linmod$y lasso = cv.glmnet(X, y) ridge = cv.glmnet(X, y, alpha=0) enet = cv.glmnet(X,y,alpha=.5)
par(mfrow=c(2,2)) plot(lasso) plot(ridge) plot(enet) par(mfrow=c(1,1))
For enet
and lasso
, lambda.1se
gives sparser models. For ridge
, use lambda.min
(more like GCV).
lasso1 = coef(lasso, 'lambda.min') enet1 = coef(lasso, 'lambda.min') ridge1 = coef(ridge, 'lambda.min')
ord = order(coef(linmod)) df = data.frame(lm = coef(linmod)[ord], lasso = lasso1[ord], elnet = enet1[ord], ridge = ridge1[ord]) df$var = rownames(df) gather(df, key='method',value='estimate',-var) %>% ggplot(aes(y=var,x=estimate,color=method)) + geom_point() + theme_minimal()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.