cv.grpnet: Cross-validation for grpnet
In adelie: Group Lasso and Elastic Net Solver for Generalized Linear Models

cv.grpnet

R Documentation

Cross-validation for grpnet

Description

Does k-fold cross-validation for grpnet

Usage

cv.grpnet(
  X,
  glm,
  n_folds = 10,
  foldid = NULL,
  min_ratio = 0.01,
  lmda_path_size = 100,
  offsets = NULL,
  progress_bar = FALSE,
  n_threads = 1,
  ...
)

Arguments

`X`	Feature matrix. Either a regualr R matrix, or else an `adelie` custom matrix class, or a concatination of such.
`glm`	GLM family/response object. This is an expression that represents the family, the reponse and other arguments such as weights, if present. The choices are `glm.gaussian()`, `glm.binomial()`, `glm.poisson()`, `glm.multinomial()`, `glm.cox()`, `glm.multinomial()`, and `glm.multigaussian()`. This is a required argument, and there is no default. In the simple example below, we use `glm.gaussian(y)`.
`n_folds`	(default 10). Although `n_folds` can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is `n_folds=3`.
`foldid`	An optional vector of values between 1 and `n_folds` identifying what fold each observation is in. If supplied, `n_folds` can be missing.
`min_ratio`	Ratio between smallest and largest value of lambda. Default is 1e-2.
`lmda_path_size`	Number of values for `lambda`, if generated automatically. Default is 100.
`offsets`	Offsets, default is `NULL`. If present, this is a fixed vector or matrix corresponding to the shape of the natural parameter, and is added to the fit.
`progress_bar`	Progress bar. Default is `FALSE`.
`n_threads`	Number of threads, default `1`.
`...`	Other arguments that can be passed to `grpnet`

Details

The function runs grpnet n_folds+1 times; the first to get the lambda sequence, and then the remainder to compute the fit with each of the folds omitted. The out-of-fold deviance is accumulated, and the average deviance and standard deviation over the folds is computed. Note that cv.grpnet does NOT search for values for alpha. A specific value should be supplied, else alpha = 1 is assumed by default. If users would like to cross-validate alpha as well, they should call cv.grpnet with a pre-computed vector foldid, and then use this same foldid vector in separate calls to cv.grpnet with different values of alpha. Note also that the results of cv.grpnet are random, since the folds are selected at random (unless supplied via foldid). Users can reduce this randomness by running cv.grpnet many times, and averaging the error curves.

Value

an object of class "cv.grpnet" is returned, which is a list with the ingredients of the cross-validation fit.

`lambda`	the values of `lambda` used in the fits.
`cvm`	The mean cross-validated deviance - a vector of length `length(lambda)`.
`cvsd`	estimate of standard error of `cvm`.
`cvup`	upper curve = `cvm+cvsd`.
`cvlo`	lower curve = `cvm-cvsd`.
`nzero`	number of non-zero coefficients at each `lambda`.
`name`	a text string indicating type of measure (for plotting purposes). Currently this is `"deviance"`
`grpnet.fit`	a fitted grpnet object for the full data.
`lambda.min`	value of `lambda` that gives minimum `cvm`.
`lambda.1se`	largest value of `lambda` such that mean deviance is within 1 standard error of the minimum.
`index`	a one column matrix with the indices of `lambda.min` and `lambda.1se` in the sequence of coefficients, fits etc.

Author(s)

James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie hastie@stanford.edu

References

Yang, James and Hastie, Trevor. (2024) A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent. arXiv \Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.2405.08631")}.
Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent (2010), Journal of Statistical Software, Vol. 33(1), 1-22, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v033.i01")}.
Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol. 39(5), 1-13, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v039.i05")}.
Tibshirani,Robert, Bien, J., Friedman, J., Hastie, T.,Simon, N.,Taylor, J. and Tibshirani, Ryan. (2012) Strong Rules for Discarding Predictors in Lasso-type Problems, JRSSB, Vol. 74(2), 245-266, https://arxiv.org/abs/1011.2234.

Examples

set.seed(0)
n <- 100
p <- 200
X <- matrix(rnorm(n * p), n, p)
y <- X[,1:25] %*% rnorm(25)/4 + rnorm(n)
groups <- c(1, sample(2:199, 60, replace = FALSE))
groups <- sort(groups)
cvfit <- cv.grpnet(X, glm.gaussian(y), groups = groups)
print(cvfit)
plot(cvfit)
predict(cvfit, newx = X[1:5,])
predict(cvfit, type = "nonzero")

adelie documentation built on April 3, 2025, 8:58 p.m.