library(knitr) library(tidyverse) opts_chunk$set(echo=FALSE, fig.align='center', fig.width=8, fig.height=8, cache=TRUE, autodep=TRUE, cache.comments=FALSE, message=FALSE, warning=FALSE)
Your goal is to use this data to predict economic mobility. Note that there are generally more observations than predictors.
mob = read.csv('mobility.csv')
Using glmnet
, estimate 4 models: the linear model, ridge regression, the lasso, and the elastic net ($\alpha=.5$). Don't use the variables ID
, Name
, or State
(Why?)
Plot the CV curves for each of the three regularized models (easy).
Use lambda.min
to get a particular model for each of the regularized ones.
Plot the coefficients for each of the 4 models on one figure. What do you notice? Which features are most important?
library(glmnet) linmod = lm(Mobility~.-ID-Name-State, data=mob, y=TRUE) X = model.matrix(linmod)[,-1] y = linmod$y lasso = cv.glmnet(X, y) ridge = cv.glmnet(X, y, alpha=0, lambda.min.ratio=1e-6) enet = cv.glmnet(X,y,alpha=.5)
par(mfrow=c(2,2)) plot(lasso) plot(ridge) plot(enet) par(mfrow=c(1,1))
For enet
and lasso
, lambda.1se
gives sparser models. For ridge
, use lambda.min
(more like GCV).
lasso1 = as.numeric(coef(lasso, 'lambda.min')) enet1 = as.numeric(coef(enet, 'lambda.min')) ridge1 = as.numeric(coef(ridge, 'lambda.min'))
ord = order(coef(linmod)) df = data.frame(lm = coef(linmod)[ord], lasso = lasso1[ord], elnet = enet1[ord], ridge = ridge1[ord]) df$var = rownames(df) df %>% mutate(var = str_replace_all(var, "_"," ")) %>% pivot_longer(names_to='method',values_to ='estimate', -var) %>% ggplot(aes(y=var,x=estimate,color=method)) + geom_point() + geom_vline(xintercept = 0) + cowplot::theme_cowplot() + scale_color_viridis_d(direction = -1) + theme(axis.title.y = element_blank()) #scale_x_continuous(trans = scales::pseudo_log_trans(sigma=.1))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.