cv.grpnet: Cross-Validation for grpnet
In grpnet: Group Elastic Net Regularized GLMs and GAMs

cv.grpnet

R Documentation

Cross-Validation for grpnet

Description

Implements k-fold cross-validation for grpnet to find the regularization parameters that minimize the prediction error (deviance, mean squared error, mean absolute error, or misclassification rate).

Usage

cv.grpnet(x, ...)

## Default S3 method:
cv.grpnet(x, 
          y, 
          group,
          weights = NULL,
          offset = NULL,
          alpha = c(0.01, 0.25, 0.5, 0.75, 1),
          gamma = c(3, 4, 5),
          type.measure = NULL,
          nfolds = 10, 
          foldid = NULL,
          same.lambda = FALSE,
          parallel = FALSE, 
          cluster = NULL, 
          verbose = interactive(), 
          adaptive = FALSE,
          power = 1,
          ...)
           
## S3 method for class 'formula'
cv.grpnet(formula,
          data, 
          use.rk = TRUE,
          weights = NULL,
          offset = NULL,
          alpha = c(0.01, 0.25, 0.5, 0.75, 1),
          gamma = c(3, 4, 5),
          type.measure = NULL,
          nfolds = 10, 
          foldid = NULL, 
          same.lambda = FALSE,
          parallel = FALSE, 
          cluster = NULL, 
          verbose = interactive(), 
          adaptive = FALSE,
          power = 1,
          ...)

Arguments

`x`	Model (design) matrix of dimension `nobs` by `nvars` (`n \times p`).
`y`	Response vector of length `n` or matrix of dimension `n \times m`. Note that matrix inputs are (i) required for multigaussian family, (ii) allowed for binomial and multinomial families (see "Binomial and multinomial" section in `grpnet`), and (iii) not permitted for other families.
`group`	Group label vector (factor, character, or integer) of length `p`. Predictors with the same label are grouped together for regularization.
`formula`	Model formula: a symbolic description of the model to be fitted. Uses the same syntax as `lm` and `glm`.
`data`	Optional data frame containing the variables referenced in `formula`.
`use.rk`	If `TRUE` (default), the `rk.model.matrix` function is used to build the model matrix. Otherwise, the `model.matrix` function is used to build the model matrix. Additional arguments to the `rk.model.matrix` function can be passed via the `...` argument.
`weights`	Optional vector of length `n` with non-negative weights to use for weighted (penalized) likelihood estimation. Defaults to a vector of ones.
`offset`	Optional vector of length `n` with an a priori known term to be included in the model's linear predictor. Defaults to a vector of zeros.
`alpha`	Scalar or vector specifying the elastic net tuning parameter `\alpha`. If `alpha` is a vector (default), then (a) the same `foldid` is used to compute the cross-validation error for each `\alpha`, and (b) the solution for the optimal `\alpha` is returned.
`gamma`	Scalar or vector specifying the penalty hyperparameter `\gamma` for MCP or SCAD. If `gamma` is a vector (default), then (a) the same `foldid` is used to compute the cross-validation error for each `\gamma`, and (b) the solution for the optimal `\gamma` is returned.
`type.measure`	Loss function for cross-validation. Options include: `"deviance"` for model deviance, `"mse"` for mean squared error, `"mae"` for mean absolute error, or `"class"` for classification error. Note that `"class"` is only available for binomial and multinomial families. The default is classification error (for binomial and multinomial) or mean absolute error (others).
`nfolds`	Number of folds for cross-validation.
`foldid`	Optional vector of length `n` giving the fold identification for each observation. Must be coercible into a factor. After coersion, the `nfolds` argument is defined as `nfolds = nlevels(foldid)`.
`same.lambda`	Logical specfying if the same `\lambda` sequence should be used for fitting the model to each fold's data. If `FALSE` (default), the `\lambda` sequence is determined separately holding out each fold, and the `\lambda` sequence from the full model is used to align the predictions. If `TRUE`, the `\lambda` sequence from the full model is used to fit the model for each fold. The default often provides better (i.e., more stable) computational performance.
`parallel`	Logical specifying if sequential computing (default) or parallel computing should be used. If `TRUE`, the fitting for each fold is parallelized.
`cluster`	Optional cluster to use for parallel computing. If `parallel = TRUE` and `cluster = NULL`, then the cluster is defined `cluster = makeCluster(2L)`, which uses two cores. Recommended usage: `cluster = makeCluster(detectCores())`
`verbose`	Logical indicating if the fitting progress should be printed. Defaults to `TRUE` in interactive sessions and `FALSE` otherwise.
`adaptive`	Logical indicating if the adaptive group elastic net should be used (see Note).
`power`	If `adaptive = TRUE`, then the adaptive penalty weights are defined by dividing the original penalty weights by `tapply(coef, group, norm, type = "F")^power`.
`...`	Optional additional arguments for `grpnet` (e.g., `standardize`, `penalty.factor`, etc.)

Details

This function calls the grpnet function nfolds+1 times: once on the full dataset to obtain the lambda sequence, and once holding out each fold's data to evaluate the prediction error. The syntax of (the default S3 method for) this function closely mimics that of the cv.glmnet function in the glmnet package (Friedman, Hastie, & Tibshirani, 2010).

Let \mathbf{D}_u = \{\mathbf{y}_u, \mathbf{X}_u\} denote the u-th fold's data, let \mathbf{D}_{[u]} = \{\mathbf{y}_{[u]}, \mathbf{X}_{[u]}\} denote the full dataset excluding the u-th fold's data, and let \boldsymbol\beta_{\lambda [u]} denote the coefficient estimates obtained from fitting the model to \mathbf{D}_{[u]} using the regularization parameter \lambda.

The cross-validation error for the u-th fold is defined as

E_u(\lambda) = C(\boldsymbol\beta_{\lambda [u]} , \mathbf{D}_u)

where C(\cdot , \cdot) denotes the cross-validation loss function that is specified by type.measure. For example, the "mse" loss function is defined as

C(\boldsymbol\beta_{\lambda [u]} , \mathbf{D}_u) = \| \mathbf{y}_u - \mathbf{X}_u \boldsymbol\beta_{\lambda [u]} \|^2

where \| \cdot \| denotes the L2 norm.

The mean cross-validation error cvm is defined as

\bar{E}(\lambda) = \frac{1}{v} \sum_{u = 1}^v E_u(\lambda)

where v is the total number of folds. The standard error cvsd is defined as

S(\lambda) = \sqrt{ \frac{1}{v (v - 1)} \sum_{u=1}^v (E_u(\lambda) - \bar{E}(\lambda))^2 }

which is the classic definition of the standard error of the mean.

Value

`lambda`	regularization parameter sequence for the full data
`cvm`	mean cross-validation error for each `lambda`
`cvsd`	estimated standard error of `cvm`
`cvup`	upper curve: `cvm + cvsd`
`cvlo`	lower curve: `cvm - cvsd`
`nzero`	number of non-zero groups for each `lambda`
`grpnet.fit`	fitted grpnet object for the full data
`lambda.min`	value of `lambda` that minimizes `cvm`
`lambda.1se`	largest `lambda` such that `cvm` is within one `cvsd` from the minimum (see Note)
`index`	two-element vector giving the indices of `lambda.min` and `lambda.1se` in the `lambda` vector, i.e., `c(minid, se1id)` as defined in the Note
`type.measure`	loss function for cross-validation (used for plot label)
`call`	matched call
`time`	runtime in seconds to perform k-fold CV tuning
`tune`	data frame containing the tuning results, i.e., min(cvm) for each combination of `alpha` and/or `gamma`

Note

When adaptive = TRUE, the adaptive group elastic net is used:
(1) an initial fit with alpha = 0 estimates the penalty.factor
(2) a second fit using estimated penalty.factor is returned

lambda.1se is defined as follows:
minid <- which.min(cvm)
min1se <- cvm[minid] + cvsd[minid]
se1id <- which(cvm <= min1se)[1]
lambda.1se <- lambda[se1id]

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1-22. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v033.i01")}

Helwig, N. E. (2025). Versatile descent algorithms for group regularization and variable selection in generalized linear models. Journal of Computational and Graphical Statistics, 34(1), 239-252. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10618600.2024.2362232")}

Examples


######***######   family = "gaussian"   ######***######

# load data
data(auto)

# 10-fold cv (formula method, response = mpg)
set.seed(1)
mod <- cv.grpnet(mpg ~ ., data = auto)

# print min and 1se solution info
mod

# plot cv error curve
plot(mod)



######***######   family = "multigaussian"   ######***######

# load data
data(auto)

# 10-fold cv (formula method, response = (mpg, displacement))
y <- as.matrix(auto[,c(1,3)])
set.seed(1)
mod <- cv.grpnet(y ~ ., data = auto[,-c(1,3)], family = "multigaussian",
                 standardize.response = TRUE)

# print min and 1se solution info
mod

# plot cv error curve
plot(mod)



######***######   family = "svm1"   ######***######

# load data
data(auto)

# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")

# 10-fold cv (default method, response = origin with 2 levels)
set.seed(1)
mod <- cv.grpnet(origin ~ ., data = auto, family = "svm1")

# print min and 1se solution info
mod

# plot cv error curve
plot(mod)



######***######   family = "svm2"   ######***######

# load data
data(auto)

# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")

# 10-fold cv (default method, response = origin with 2 levels)
set.seed(1)
mod <- cv.grpnet(origin ~ ., data = auto, family = "svm2")

# print min and 1se solution info
mod

# plot cv error curve
plot(mod)



######***######   family = "logit"   ######***######

# load data
data(auto)

# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")

# 10-fold cv (default method, response = origin with 2 levels)
set.seed(1)
mod <- cv.grpnet(origin ~ ., data = auto, family = "logit")

# print min and 1se solution info
mod

# plot cv error curve
plot(mod)


######***######   family = "binomial"   ######***######

# load data
data(auto)

# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")

# 10-fold cv (default method, response = origin with 2 levels)
set.seed(1)
mod <- cv.grpnet(origin ~ ., data = auto, family = "binomial")

# print min and 1se solution info
mod

# plot cv error curve
plot(mod)



######***######   family = "multinomial"   ######***######

# load data
data(auto)

# 10-fold cv (formula method, response = origin with 3 levels)
set.seed(1)
mod <- cv.grpnet(origin ~ ., data = auto, family = "multinomial")

# print min and 1se solution info
mod

# plot cv error curve
plot(mod)



######***######   family = "poisson"   ######***######

# load data
data(auto)

# 10-fold cv (formula method, response = horsepower)
set.seed(1)
mod <- cv.grpnet(horsepower ~ ., data = auto, family = "poisson")

# print min and 1se solution info
mod

# plot cv error curve
plot(mod)



######***######   family = "negative.binomial"   ######***######

# load data
data(auto)

# 10-fold cv (formula method, response = horsepower)
set.seed(1)
mod <- cv.grpnet(horsepower ~ ., data = auto, family = "negative.binomial")

# print min and 1se solution info
mod

# plot cv error curve
plot(mod)



######***######   family = "Gamma"   ######***######

# load data
data(auto)

# 10-fold cv (formula method, response = origin)
set.seed(1)
mod <- cv.grpnet(mpg ~ ., data = auto, family = "Gamma")

# print min and 1se solution info
mod

# plot cv error curve
plot(mod)



######***######   family = "inverse.gaussian"   ######***######

# load data
data(auto)

# 10-fold cv (formula method, response = origin)
set.seed(1)
mod <- cv.grpnet(mpg ~ ., data = auto, family = "inverse.gaussian")

# print min and 1se solution info
mod

# plot cv error curve
plot(mod)

grpnet documentation built on June 10, 2025, 5:13 p.m.

grpnet index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

grpnet
Group Elastic Net Regularized GLMs and GAMs

cv.grpnet: Cross-Validation for grpnet
In grpnet: Group Elastic Net Regularized GLMs and GAMs

Cross-Validation for grpnet

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to cv.grpnet in grpnet...

R Package Documentation

Browse R Packages

We want your feedback!

grpnet Group Elastic Net Regularized GLMs and GAMs

cv.grpnet: Cross-Validation for grpnet In grpnet: Group Elastic Net Regularized GLMs and GAMs

Cross-Validation for grpnet

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

See Also

Examples

Related to cv.grpnet in grpnet...

R Package Documentation

Browse R Packages

We want your feedback!

grpnet
Group Elastic Net Regularized GLMs and GAMs

cv.grpnet: Cross-Validation for grpnet
In grpnet: Group Elastic Net Regularized GLMs and GAMs