# the code in this chunk enables us to truncate the print output for each # chunk using the `out.lines` option # save the built-in output hook hook_output <- knitr::knit_hooks$get("output") # set a new output hook to truncate text output knitr::knit_hooks$set(output = function(x, options) { if (!is.null(n <- options$out.lines)) { x <- xfun::split_lines(x) if (length(x) > n) { # truncate the output x <- c(head(x, n), "....\n") } x <- paste(x, collapse = "\n") } hook_output(x, options) })
In this vignette, we describe how the glmnet
package can be used to fit the relaxed lasso.
The idea of the relaxed lasso is to take a glmnet
fitted object, and then for each lambda, refit the variables in the active set without any penalization. This gives the "relaxed" fit. (We note that there have been other definitions of a relaxed fit, but this is the one we prefer.) This could of course be done for elastic net fits as well as lasso. However, if the number of variables gets too close to the sample size $N$, the relaxed path will be truncated. Furthermore, for binomial and other nonlinear generalized linear models (GLMs) convergence can be an issue with our current implementation if the number of variables is too large, and perversely if the relaxed fit is too strong.
Suppose the glmnet
fitted linear predictor at $\lambda$ is $\hat\eta_\lambda(x)$ and the relaxed version is $\tilde\eta_\lambda(x)$. We also allow for shrinkage between the two:
$$\tilde \eta_{\lambda,\gamma}=(1-\gamma)\tilde
\eta_\lambda(x)+\gamma\hat\eta_\lambda(x).$$
$\gamma\in[0,1]$ is an additional tuning parameter which can be selected by cross-validation (CV). The debiasing will potentially improve prediction performance, and CV will typically select a model with a smaller number of variables.
This procedure is very competitive with forward-stepwise and best-subset regression, and has a considerable speed advantage when the number of variables is large. This is especially true for best-subset, but even so for forward stepwise. The latter has to plod through the variables one-at-a-time, while glmnet
will just plunge in
and find a good active set.
Further details on this form of relaxed fitting can be found in @best_subset; more information on glmnet and elastic-net model in general is given in @glmnet, @coxnet, @strongrules, and @block.
We demonstrate the most basic relaxed lasso fit as a first example. We load some pre-generated data and fit the relaxed lasso on it by calling glmnet
with relax = TRUE
:
library(glmnet) data(QuickStartExample) x <- QuickStartExample$x y <- QuickStartExample$y fit <- glmnet(x, y, relax = TRUE) print(fit)
In addition to the three columns usually printed for glmnet
objects
(Df
, %Dev
and Lambda
), there is an extra column %Dev R
(R
stands for "relaxed") which is the percent deviance explained by the
relaxed fit. This is always higher than its neighboring column, which
is the percent deviance exaplined for the penalized fit (on the
training data). Notice that when the Df
stays the same, the %Dev R
does not change, since this typically means the active set is the
same. (The code is also smart enough to only fit such models once, so
in the truncated display shown, 9 lasso models are fit, but only 4
relaxed fits are computed).
The fit object is of class "relaxed"
, which inherits from class "glmnet"
. Hence, the usual plot
method for "glmnet"
objects can be used. The code below demonstrates some additional flexibility that "relaxed"
objects have for plotting.
par(mfrow = c(1, 3), mar=c(4,4,5.5,1)) plot(fit, main = "gamma = 1") plot(fit, gamma = 0.5, main = "gamma = 0.5") plot(fit, gamma = 0, main = "gamma = 0")
gamma = 1
is the traditional glmnet
fit (also relax = FALSE
, the default), gamma = 0
is the unpenalized fit, and gamma = 0.5
is a mixture of the two (at the coefficient level, and hence also the linear predictors).
We can also select gamma
using cv.glmnet
, which by default uses the 5 values c(0, 0.25, 0.5, 0.75, 1)
. This returns an object of class "cv.relaxed"
.
set.seed(1) cfit <- cv.glmnet(x, y, relax = TRUE) plot(cfit)
To remove the shading of the standard error bands, pass se.bands = FALSE
:
plot(cfit, se.bands = FALSE)
As with regular "cv.glmnet"
objects, you can make predictions from a relaxed CV object. Just as the s
option (for lambda
) admits two special strings "lambda.1se"
and "lambda.min"
for special values of lambda
, the gamma
option admits two special strings "gamma.1se"
and "gamma.min"
for special values of gamma
. For example, the code below makes predictions for newx
at the lambda
and gamma
values that has the smallest CV error:
predict(cfit, newx = x[1:5, ], s = "lambda.min", gamma = "gamma.min")
Printing class "cv.relaxed"
objects gives some basic information on the cross-validation:
print(cfit)
While we only demonstrate relaxed fits for the default Gaussian family, any of the families fit by glmnet
can also be fit with the relaxed
option.
Although glmnet
has a relax
option, you can also fit relaxed lasso models by post-processing a glmnet
object with the relax.glmnet
function.
fit <- glmnet(x,y) fitr <- relax.glmnet(fit, x = x, y = y)
This will rarely need to be done; one use case is if the original fit took a long time, and the user wants to avoid refitting it. Note that the arguments are named in the call in order for them to be passed correctly via the ...
argument in relax.glmnet
.
As mentioned, a "relaxed"
object inherits from class "glmnet"
. Apart from the class modification, it has an additional component named relaxed
which is itself a glmnet
object, but with the relaxed coefficients. The default behavior of extractor functions like predict
and coef
, as well as plot
will be to present results from the glmnet
fit, unless a value of gamma
is given different from the default value gamma = 1
(see the plots above). The print
method gives additional info on the relaxed fit.
Likewise, a cv.relaxed
object inherits from class cv.glmnet
. Here the predict
method by default uses the optimal relaxed fit; if predictions from the CV-optimal original glmnet
fit are desired, one can directly use predict.cv.glmnet
. Similarly, use print
to print information for cross-validation on the relaxed fit, and print.cv.glmnet
for information on the cross-validation for the original glmnet
fit.
print(cfit) print.cv.glmnet(cfit)
glmnet
itself is used to fit the relaxed fits by using a single value of zero for lambda
. However, for nonlinear models such as family = "binomial"
, family = "multinomial"
and family="poisson"
, there can be convergence issues. This is because glmnet
does not do step size optimization, rather relying on
the pathwise fit to stay in the "quadratic" zone of the log-likelihood. We have an optional path = TRUE
option for relax.glmnet
, which actually fits a regurized path toward the lambda = 0
solution, and thus avoids
the issue. The default is path = FALSE
since this option adds to the computing time.
One use case for a relaxed fit is as a faster version of forward stepwise regression. With a large number p
of variables, forward stepwise regression can be tedious. On the other hand, because the lasso solves a convex problem, it can plunge in and identify good candidate sets of variables over 100 values of lambda
, even though p
could be in the tens of thousands. In a case like this, one can have cv.glmnet
do the selection of variables.
fitr <- cv.glmnet(x, y, gamma = 0, relax = TRUE) plot(fitr)
Notice that we only allow gamma = 0
, so in this case we are not considering the blended fits.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.