grpnet: fit a GLM with group lasso or group elastic-net...

View source: R/solver.R

grpnetR Documentation

fit a GLM with group lasso or group elastic-net regularization

Description

Computes a group elastic-net regularization path for a variety of GLM and other families, including the Cox model. This function extends the abilities of the glmnet package to allow for grouped regularization. The code is very efficient (core routines are written in C++), and allows for specialized matrix classes.

Usage

grpnet(
  X,
  glm,
  constraints = NULL,
  groups = NULL,
  alpha = 1,
  penalty = NULL,
  offsets = NULL,
  lambda = NULL,
  standardize = TRUE,
  irls_max_iters = as.integer(10000),
  irls_tol = 1e-07,
  max_iters = as.integer(1e+05),
  tol = 1e-07,
  adev_tol = 0.9,
  ddev_tol = 0,
  newton_tol = 1e-12,
  newton_max_iters = 1000,
  n_threads = 1,
  early_exit = TRUE,
  intercept = TRUE,
  screen_rule = c("pivot", "strong"),
  min_ratio = 0.01,
  lmda_path_size = 100,
  max_screen_size = NULL,
  max_active_size = NULL,
  pivot_subset_ratio = 0.1,
  pivot_subset_min = 1,
  pivot_slack_ratio = 1.25,
  check_state = FALSE,
  progress_bar = FALSE,
  warm_start = NULL
)

Arguments

X

Feature matrix. Either a regular R matrix, or else an adelie custom matrix class, or a concatination of such.

glm

GLM family/response object. This is an expression that represents the family, the reponse and other arguments such as weights, if present. The choices are glm.gaussian(), glm.binomial(), glm.poisson(), glm.multinomial(), glm.cox(), glm.multinomial(), and glm.multigaussian(). This is a required argument, and there is no default. In the simple example below, we use glm.gaussian(y).

constraints

Group-wise constraints on the parameters, supplied as a list with an element for each group. Default is NULL, which means no constraints. List elements can be NULL as well. Currently only 'box constraints' are supported, which means upper and lower limits. The function constraint.box() must be used to set the constraints for each group that has constraints. Details are given in the documentation for constraint.box.

groups

This is an ordered vector of integers that represents the groupings, with each entry indicating where a group begins. The entries refer to column numbers in the feature matrix, and hence the memebers of a group have to be contiguous. If there are p features, the default is 1:p (no groups; i.e. p groups each of of size 1). So the length of groups is the number of groups. (Note that in the state output of grpnet this vector might be shifted to start from 0, since internally adelie uses zero-based indexing.)

alpha

The elasticnet mixing parameter, with 0\le\alpha\le 1. The penalty is defined as

(1-\alpha)/2\sum_j||\beta_j||_2^2+\alpha\sum_j||\beta_j||_2,

where thte sum is over groups. alpha=1 is pure group lasso penalty, and alpha=0 the pure ridge penalty.

penalty

Separate penalty factors can be applied to each group of coefficients. This is a number that multiplies lambda to allow differential shrinkage for groups. Can be 0 for some groups, which implies no shrinkage, and that group is always included in the model. Default is square-root of group sizes for each group.

offsets

Offsets, default is NULL. If present, this is a fixed vector or matrix corresponding to the shape of the natural parameter, and is added to the fit.

lambda

A user supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence based on lmda_path_size and min_ratio. This is returned with the fit.

standardize

If TRUE (the default), the columns of X are standardized before the fit is computed. This is good practice if the features are on different scales, because it has an impact on the penalty. The regularization path is computed using the standardized features, and the standardization information is saved on the object for making future predictions. The different matrix classes have their own methods for standardization. For example, for a sparse matrix the standardization information will be computed, but not actually applied (eg centering would destroy the sparsity). Rather, the methods for matrix multiply will be aware, and incorporate the standardization information.

irls_max_iters

Maximum number of IRLS iterations, default is 1e4.

irls_tol

IRLS convergence tolerance, default is 1e-7.

max_iters

Maximum total number of coordinate descent iterations, default is 1e5.

tol

Coordinate descent convergence tolerance, default 1e-7.

adev_tol

Fraction deviance explained tolerance, default 0.9. This can be seen as a limit on overfitting the training data.

ddev_tol

Difference in fraction deviance explained tolerance, default 0. If a step in the path changes the deviance by this amount or less, the algorithm truncates the path.

newton_tol

Convergence tolerance for the BCD update, default 1e-12. This parameter controls the iterations in each block-coordinate step to establish the block solution.

newton_max_iters

Maximum number of iterations for the BCD update, default 1000.

n_threads

Number of threads, default 1.

early_exit

TRUE if the function should be allowed to exit early.

intercept

Default TRUE to include an unpenalized intercept.

screen_rule

Screen rule, with default "pivot". Other option is "strong". (an empirical improvement over "strong", the other option.)

min_ratio

Ratio between smallest and largest value of lambda. Default is 1e-2.

lmda_path_size

Number of values for lambda, if generated automatically. Default is 100.

max_screen_size

Maximum number of screen groups. Default is NULL.

max_active_size

Maximum number of active groups. Default is NULL.

pivot_subset_ratio

Subset ratio of pivot rule. Default is 0.1. Users not expected to fiddle with this.

pivot_subset_min

Minimum subset of pivot rule. Defaults is 1. Users not expected to fiddle with this.

pivot_slack_ratio

Slack ratio of pivot rule, default is 1.25. Users not expected to fiddle with this. See reference for details.

check_state

Check state. Internal parameter, with default FALSE.

progress_bar

Progress bar. Default is FALSE.

warm_start

Warm start (default is NULL). Internal parameter.

Value

A list of class "grpnet". This has a main component called state which represents the fitted path, and a few extra useful components such as the call, the family name, groups and group_sizes. Users are encouraged to use methods like predict(), coef(), print(), plot() etc to examine the object.

Author(s)

James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie hastie@stanford.edu

References

Yang, James and Hastie, Trevor. (2024) A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent. arXiv \Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.2405.08631")}.
Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent (2010), Journal of Statistical Software, Vol. 33(1), 1-22, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v033.i01")}.
Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol. 39(5), 1-13, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v039.i05")}.
Tibshirani,Robert, Bien, J., Friedman, J., Hastie, T.,Simon, N., Taylor, J. and Tibshirani, Ryan. (2012) Strong Rules for Discarding Predictors in Lasso-type Problems, JRSSB, Vol. 74(2), 245-266, https://arxiv.org/abs/1011.2234.

See Also

cv.grpnet, predict.grpnet, coef.grpnet, plot.grpnet, print.grpnet.

Examples

set.seed(0)
n <- 100
p <- 200
X <- matrix(rnorm(n * p), n, p)
y <- X[,1] * rnorm(1) + rnorm(n)
## Here we create 60 groups randomly. Groups need to be contiguous, and the `groups` variable
## indicates the beginning position of each group.
groups <- c(1, sample(2:199, 60, replace = FALSE))
groups <- sort(groups)
print(groups)
fit <- grpnet(X, glm.gaussian(y), groups = groups)
print(fit)
plot(fit)
coef(fit)
cvfit  <- cv.grpnet(X, glm.gaussian(y), groups = groups)
print(cvfit)
plot(cvfit)
predict(cvfit,newx=X[1:5,], lambda="lambda.min")

adelie documentation built on April 3, 2025, 8:58 p.m.