cv_glmnet: Fitting of regularized linear models

View source: R/cv_glmnet.R

cv_glmnetR Documentation

Fitting of regularized linear models

Description

Convenience function for fitting multivariate linear models with multivariate response by relying on cv.glmnet from the glmnet-package. The function fits the multivariate linear model

\mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{E},

where \mathbf{X} is a p-dimensional vector, \mathbf{Y} and \mathbf{E} are two q-dimensional vectors, and \mathbf{B} is a p\times q matrix.

If \mathbf{X} and \mathbf{Y} are centered (i.e., have zero-mean columns), the function estimates \mathbf{B} by solving, for the sample (\mathbf{X}_1, \mathbf{Y}_1), \ldots, (\mathbf{X}_n, \mathbf{Y}_n), the elastic-net optimization problem

\min_{\mathbf{B}\in R^{q \times p}} \frac{1}{2n}\sum_{i=1}^n \|\mathbf{Y}_i-\mathbf{X}_i\mathbf{B}\|^2 + \lambda\left[(1-\alpha)\|\mathbf{B}\|_\mathrm{F}^2 / 2 + \alpha \sum_{j=1}^p \|\mathbf{B}_j\|_2\right],

where \|\mathbf{B}\|_\mathrm{F} stands for the Frobenious norm of the matrix \mathbf{B} and \|\mathbf{B}_j\|_2 for the Euclidean norm of the j-th row of \mathbf{B}. The choice \alpha = 0 in the elastic-net penalization corresponds to ridge regression, whereas \alpha = 1 yields a lasso-type estimator. The unpenalized least-squares estimator is obtained with \lambda = 0.

Usage

cv_glmnet(x, y, alpha = c("lasso", "ridge")[1], lambda = NULL,
  intercept = TRUE, thresh = 1e-10, cv_1se = TRUE, cv_nlambda = 50,
  cv_folds = NULL, cv_grouped = FALSE, cv_lambda = 10^seq(2, -3,
  length.out = cv_nlambda), cv_second = TRUE, cv_tol_second = 0.025,
  cv_log10_exp = c(-0.5, 3), cv_thresh = 1e-05, cv_parallel = FALSE,
  cv_verbose = FALSE, ...)

Arguments

x

input matrix of size c(n, p), or a vector of length n.

y

response matrix of size c(n, q), or a vector of length n.

alpha

elastic net mixing argument in glmnet, with 0 \le \alpha \le 1. Alternatively, a character string indicating whether the "ridge" (\alpha = 0) or "lasso" (\alpha = 1) fit is to be performed.

lambda

scalar giving the regularization parameter \lambda. If NULL (default), the optimal \lambda is searched by cross-validation. If lambda is provided, then cross-validation is skipped and the fit is performed for the given lambda.

intercept

flag passed to the intercept argument in glmnet to indicate if the intercept should be fitted (default; does not assume that the data is centered) or set to zero (the optimization problem above is solved as-is). Defaults to TRUE.

thresh

convergence threshold of the coordinate descending algorithm, passed to the thresh argument in glmnet. Defaults to 1e-10.

cv_1se

shall the optimal lambda be the lambda.1se, as returned by cv.glmnet? This favors sparser fits. If FALSE, then the optimal lambda is lambda.min, the minimizer of the cross-validation loss. Defaults to TRUE.

cv_nlambda

the length of the sequence of \lambda values, passed to the nlambda argument in cv.glmnet for the cross-validation search. Defaults to 50.

cv_folds

number of folds to perform cross-validation. If NULL (the default), then
cv_folds <- n internally, that is, leave-one-out cross-validation is performed. Passed to the nfolds argument in cv.glmnet.

cv_grouped

passed to the grouped argument in cv.glmnet. Defaults to FALSE.

cv_lambda

passed to the lambda argument in cv.glmnet. Defaults to
10^seq(2, -3, length.out = cv_nlambda).

cv_second

flag to perform a second cross-validation search if the optimal \lambda was found at the extremes of the first \lambda sequence (indicating that the minimum may not be reliable). Defaults to TRUE.

cv_tol_second

tolerance for performing a second search if second = TRUE. If the minimum is found at the 100 * cv_tol_second lower/upper percentile of search interval, then the search interval is expanded for a second search. Defaults to 0.025.

cv_log10_exp

expansion of the \lambda sequence if the minimum is found close to its upper extreme. If that is the case, the sequence for the is set as 10^(log10(lambda_min) + cv_log10_exp), where lambda_min is the minimum obtained in the first sequence. If the minimum is found close to the lower extreme of the sequence, then -rev(cv_log10_exp) is considered. Defaults to c(-0.5, 5).

cv_thresh

convergence threshold used during cross-validation in cv.glmnet. Defaults to 1e-5.

cv_parallel

passed to the parallel argument in cv.glmnet. Defaults to FALSE.

cv_verbose

flag to display information about the cross-validation search with plots and messages. More useful if second = TRUE. Defaults to FALSE.

...

further parameters to be passed to glmnet to perform the final model fit.

Details

If \alpha = 1, then the lasso-type fit shrinks to zero, simultaneously, all the elements of certain rows of \mathbf{B}, thus helping the selection of the p most influential variables in \mathbf{X} for explaining/predicting \mathbf{Y}.

The function first performs a cross-validation search for the optimal \lambda if lambda = NULL (using cv_thresh to control the convergence threshold). After the optimal penalization parameter is determined, a second fit (now with convergence threshold thresh) using the default \lambda sequence in glmnet is performed. The final estimate is obtained via predict.glmnet from the optimal \lambda determined in the first step.

Due to its cross-validatory nature, cv_glmnet can be computationally demanding. Approaches for reducing the computation time include: considering a smaller number of folds than n, such as cv_folds = 10 (but will lead to random partitions of the data); decrease the tolerance of the coordinate descending algorithm by increasing cv_thresh; reducing the number of candidate \lambda values with nlambda; setting second = FALSE to avoid a second cross-validation; or considering cv_parallel = TRUE to use a parallel backend (must be registered before hand; see examples).

By default, the \lambda sequence is used with standardized \mathbf{X} and \mathbf{Y} (both divided by their columnwise variances); see glmnet and the standardized argument. Therefore, the optimal selected \lambda value assumes standardization and must be used with care if the variables are not standardized. For example, when using the ridge analytical solution, a prior change of scale that depends on q needs to be done. See the examples for the details.

Value

A list with the following entries:

beta_hat

the estimated \mathbf{B}, a matrix of size c(p, q).

lambda

the optimal \lambda obtained by cross-validation and according to cv_1se.

cv

if lambda = NULL, the result of the cross-validation search for the optimal \lambda. Otherwise, NULL.

fit

the glmnet fit, computed with thresh threshold and with an automatically chosen \lambda sequence.

Author(s)

Eduardo García-Portugués. Initial contributions by Gonzalo Álvarez-Pérez.

References

Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v033.i01")}

Examples

## Quick example for multivariate linear model with multivariate response

# Simulate data
n <- 100
p <- 10; q <- 5
set.seed(123456)
x <- matrix(rnorm(n * p, sd = rep(1:p, each = n)), nrow = n, ncol = p)
e <- matrix(rnorm(n * q, sd = rep(q:1, each = n)), nrow = n, ncol = q)
beta <- matrix(((1:p - 1) / p)^2, nrow = p, ncol = q)
y <- x %*% beta + e

# Fit lasso (model with intercept, the default)
cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE)$beta_hat

## Multivariate linear model with multivariate response

# Simulate data
n <- 100
p <- 10; q <- 5
set.seed(123456)
x <- matrix(rnorm(n * p, sd = rep(1:p, each = n)), nrow = n, ncol = p)
e <- matrix(rnorm(n * q, sd = rep(q:1, each = n)), nrow = n, ncol = q)
beta <- matrix(((1:p - 1) / p)^2, nrow = p, ncol = q)
y <- x %*% beta + e

# Fit ridge
cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", intercept = FALSE,
                 cv_verbose = TRUE)
cv0$beta_hat

# Same fit for the chosen lambda
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE)$beta_hat

# Fit lasso (model with intercept, the default)
cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE)
cv1$beta_hat

# Use cv_1se = FALSE
cv1_min <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE,
                     cv_1se = FALSE)

# Compare it with ridge analytical solution. Observe the factor in lambda,
# necessary since lambda is searched for the standardized data. Note also
# that, differently to the case q = 1, no standardarization with respect to
# y happens
sd_x <- apply(x, 2, function(x) sd(x)) * sqrt((n - 1) / n)
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
          thresh = 1e-20, intercept = FALSE)$beta_hat
solve(crossprod(x) + diag(cv0$lambda * sd_x^2 * n)) %*% t(x) %*% y

# If x is standardized, the match between glmnet and usual ridge
# analytical expression does not require scaling of lambda
x_std <- scale(x, scale = sd_x, center = TRUE)
cv_glmnet(x = x_std, y = y, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE, thresh = 1e-20)$beta_hat
solve(crossprod(x_std) + diag(rep(cv0$lambda * n, p))) %*% t(x_std) %*% y

## Simple linear model

# Simulate data
n <- 100
p <- 1; q <- 1
set.seed(123456)
x <- matrix(rnorm(n * p), nrow = n, ncol = p)
e <- matrix(rnorm(n * q), nrow = n, ncol = q)
beta <- 2
y <- x * beta + e

# Fit by ridge (model with intercept, the default)
cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", cv_verbose = TRUE)
cv0$beta_hat
cv0$intercept

# Comparison with linear model with intercept
lm(y ~ 1 + x)$coefficients

# Fit by ridge (model without intercept)
cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", cv_verbose = TRUE,
                 intercept = FALSE)
cv0$beta_hat

# Comparison with linear model without intercept
lm(y ~ 0 + x)$coefficients

# Same fit for the chosen lambda (and without intercept)
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE)$beta_hat

# Same for lasso (model with intercept, the default)
cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso")
cv1$beta_hat

## Multivariate linear model (p = 3, q = 1)

# Simulate data
n <- 50
p <- 10; q <- 1
set.seed(123456)
x <- matrix(rnorm(n * p, mean = 1, sd = rep(1:p, each = n)),
            nrow = n, ncol = p)
e <- matrix(rnorm(n * q), nrow = n, ncol = q)
beta <- ((1:p - 1) / p)^2
y <- x %*% beta + e

# Fit ridge (model without intercept)
cv0 <- cv_glmnet(x = x, y = y, alpha = "ridge", intercept = FALSE,
                 cv_verbose = TRUE)
cv0$beta_hat

# Same fit for the chosen lambda
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE)$beta_hat

# Compare it with ridge analytical solution. Observe the factor in lambda,
# necessary since lambda is searched for the standardized data
sd_x <- apply(x, 2, function(x) sd(x)) * sqrt((n - 1) / n)
sd_y <- sd(y) * sqrt((n - 1) / n)
cv_glmnet(x = x, y = y, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE, thresh = 1e-20)$beta_hat
solve(crossprod(x) + diag(cv0$lambda * sd_x^2 / sd_y * n)) %*% t(x) %*% y

# If x and y are standardized, the match between glmnet and usual ridge
# analytical expression does not require scaling of lambda
x_std <- scale(x, scale = sd_x, center = TRUE)
y_std <- scale(y, scale = sd_y, center = TRUE)
cv_glmnet(x = x_std, y = y_std, alpha = "ridge", lambda = cv0$lambda,
          intercept = FALSE, thresh = 1e-20)$beta_hat
solve(crossprod(x_std) + diag(rep(cv0$lambda * n, p))) %*% t(x_std) %*% y_std

# Fit lasso (model with intercept, the default)
cv1 <- cv_glmnet(x = x, y = y, alpha = "lasso", cv_verbose = TRUE)
cv1$beta_hat

# ## Parallelization
#
# # Parallel
# doMC::registerDoMC(cores = 2)
# microbenchmark::microbenchmark(
# cv_glmnet(x = x, y = y, nlambda = 100, cv_parallel = TRUE),
# cv_glmnet(x = x, y = y, nlambda = 100, cv_parallel = FALSE),
# times = 10)


goffda documentation built on Oct. 14, 2023, 5:08 p.m.