cv.glintnet: Cross-validation for glintnet

View source: R/glintnet.R

cv.glintnetR Documentation

Cross-validation for glintnet

Description

Does k-fold cross-validation for glintnet

Usage

cv.glintnet(
  X,
  glm,
  offsets = NULL,
  intr_keys = NULL,
  intr_values,
  levels = NULL,
  n_folds = 10,
  foldid = NULL,
  n_threads = 1,
  ...
)

Arguments

X

Feature matrix. Either a regular R matrix, or else an adelie custom matrix class, or a concatination of such.

glm

GLM family/response object. This is an expression that represents the family, the reponse and other arguments such as weights, if present. The choices are glm.gaussian(), glm.binomial(), glm.poisson(), glm.multinomial(), glm.cox(), glm.multinomial(), and glm.multigaussian(). This is a required argument, and there is no default. In the simple example below, we use glm.gaussian(y).

offsets

Offsets, default is NULL. If present, this is a fixed vector or matrix corresponding to the shape of the natural parameter, and is added to the fit.

intr_keys

List of feature indices. This is a list of all features with which interactions can be formed. Default is 1:p where p is the number of columns in X.

intr_values

List of integer vectors of feature indices. For each of the m <= p indices listed in intr_keys, there is a vector of indices indicating which columns are candidates for interaction with that feature. If a vector is NULL, that means all other features are candidates for interactions. The default is a list of length m where each element is NULL; that is rep(list(NULL), m.

levels

Number of levels for each of the columns of mat, with 1 representing a quantitative feature. A factor with K levels should be represented by the numbers 0,1,...,K-1.

n_folds

(default 10). Although n_folds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is n_folds=3.

foldid

An optional vector of values between 1 and n_folds identifying what fold each observation is in. If supplied, n_folds can be missing.

n_threads

Number of threads, default 1.

...

Additional named arguments to grpnet.

Details

The function runs glintnet n_folds+1 times; the first to get the lambda sequence, and then the remainder to compute the fit with each of the folds omitted. The out-of-fold deviance is accumulated, and the average deviance and standard deviation over the folds is computed. Note that cv.glintnet does NOT search for values for alpha. A specific value should be supplied, else alpha=1 is assumed by default. If users would like to cross-validate alpha as well, they should call cv.glintnet with a pre-computed vector foldid, and then use this same foldid vector in separate calls to cv.glintnet with different values of alpha. Note also that the results of cv.glintnet are random, since the folds are selected at random. Users can reduce this randomness by running cv.glintnet many times, and averaging the error curves.

Value

A list of class "glintnet", which inherits from class "grpnet". This has a a few additional components such as pairs, groups and levels. Users typically use methods like predict(), print(), plot() etc to examine the object.

Author(s)

James Yang, Trevor Hastie, and Balasubramanian Narasimhan
Maintainer: Trevor Hastie hastie@stanford.edu

References

Lim, Michael and Hastie, Trevor (2015) Learning interactions via hierarchical group-lasso regularization, JCGS \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10618600.2014.938812")}
Yang, James and Hastie, Trevor. (2024) A Fast and Scalable Pathwise-Solver for Group Lasso and Elastic Net Penalized Regression via Block-Coordinate Descent. arXiv \Sexpr[results=rd]{tools:::Rd_expr_doi("10.48550/arXiv.2405.08631")}.
Friedman, J., Hastie, T. and Tibshirani, R. (2008) Regularization Paths for Generalized Linear Models via Coordinate Descent (2010), Journal of Statistical Software, Vol. 33(1), 1-22, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v033.i01")}.
Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2011) Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent, Journal of Statistical Software, Vol. 39(5), 1-13, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v039.i05")}.
Tibshirani,Robert, Bien, J., Friedman, J., Hastie, T.,Simon, N.,Taylor, J. and Tibshirani, Ryan. (2012) Strong Rules for Discarding Predictors in Lasso-type Problems, JRSSB, Vol. 74(2), 245-266, https://arxiv.org/abs/1011.2234.

See Also

cv.glintnet, predict.glintnet, plot.glintnet, print.glintnet.

Examples

set.seed(0)
n=500
d_cont = 5     # number of continuous features
d_disc = 5     # number of categorical features
Z_cont = matrix(rnorm(n*d_cont), n, d_cont)
levels = sample(2:5,d_disc, replace = TRUE)
Z_disc = matrix(0,n,d_disc)
for(i in seq(d_disc))Z_disc[,i] = sample(0:(levels[i]-1),n,replace=TRUE)
Z = cbind(Z_cont,Z_disc)
levels = c(rep(1,d_cont),levels)

xmat = model.matrix(~Z_cont[,1]*factor(Z_disc[,2]))
nc=ncol(xmat)
beta = rnorm(nc)
y = xmat%*%beta+rnorm(n)*1.5

cvfit <- cv.glintnet(Z, glm.gaussian(y), levels=levels, intr_keys = 1)
plot(cvfit)
predict(cvfit, newx=Z[1:5,])


adelie documentation built on April 3, 2025, 8:58 p.m.