cv.grpnet | R Documentation |
Implements k-fold cross-validation for grpnet
to find the regularization parameters that minimize the prediction error (deviance, mean squared error, mean absolute error, or misclassification rate).
cv.grpnet(x, ...)
## Default S3 method:
cv.grpnet(x,
y,
group,
weights = NULL,
offset = NULL,
alpha = c(0.01, 0.25, 0.5, 0.75, 1),
gamma = c(3, 4, 5),
type.measure = NULL,
nfolds = 10,
foldid = NULL,
same.lambda = FALSE,
parallel = FALSE,
cluster = NULL,
verbose = interactive(),
adaptive = FALSE,
power = 1,
...)
## S3 method for class 'formula'
cv.grpnet(formula,
data,
use.rk = TRUE,
weights = NULL,
offset = NULL,
alpha = c(0.01, 0.25, 0.5, 0.75, 1),
gamma = c(3, 4, 5),
type.measure = NULL,
nfolds = 10,
foldid = NULL,
same.lambda = FALSE,
parallel = FALSE,
cluster = NULL,
verbose = interactive(),
adaptive = FALSE,
power = 1,
...)
x |
Model (design) matrix of dimension |
y |
Response vector of length |
group |
Group label vector (factor, character, or integer) of length |
formula |
Model formula: a symbolic description of the model to be fitted. Uses the same syntax as |
data |
Optional data frame containing the variables referenced in |
use.rk |
If |
weights |
Optional vector of length |
offset |
Optional vector of length |
alpha |
Scalar or vector specifying the elastic net tuning parameter |
gamma |
Scalar or vector specifying the penalty hyperparameter |
type.measure |
Loss function for cross-validation. Options include: |
nfolds |
Number of folds for cross-validation. |
foldid |
Optional vector of length |
same.lambda |
Logical specfying if the same |
parallel |
Logical specifying if sequential computing (default) or parallel computing should be used. If |
cluster |
Optional cluster to use for parallel computing. If |
verbose |
Logical indicating if the fitting progress should be printed. Defaults to |
adaptive |
Logical indicating if the adaptive group elastic net should be used (see Note). |
power |
If |
... |
Optional additional arguments for |
This function calls the grpnet
function nfolds+1
times: once on the full dataset to obtain the lambda
sequence, and once holding out each fold's data to evaluate the prediction error. The syntax of (the default S3 method for) this function closely mimics that of the cv.glmnet
function in the glmnet package (Friedman, Hastie, & Tibshirani, 2010).
Let \mathbf{D}_u = \{\mathbf{y}_u, \mathbf{X}_u\}
denote the u
-th fold's data, let \mathbf{D}_{[u]} = \{\mathbf{y}_{[u]}, \mathbf{X}_{[u]}\}
denote the full dataset excluding the u
-th fold's data, and let \boldsymbol\beta_{\lambda [u]}
denote the coefficient estimates obtained from fitting the model to \mathbf{D}_{[u]}
using the regularization parameter \lambda
.
The cross-validation error for the u
-th fold is defined as
E_u(\lambda) = C(\boldsymbol\beta_{\lambda [u]} , \mathbf{D}_u)
where C(\cdot , \cdot)
denotes the cross-validation loss function that is specified by type.measure
. For example, the "mse"
loss function is defined as
C(\boldsymbol\beta_{\lambda [u]} , \mathbf{D}_u) = \| \mathbf{y}_u - \mathbf{X}_u \boldsymbol\beta_{\lambda [u]} \|^2
where \| \cdot \|
denotes the L2 norm.
The mean cross-validation error cvm
is defined as
\bar{E}(\lambda) = \frac{1}{v} \sum_{u = 1}^v E_u(\lambda)
where v
is the total number of folds. The standard error cvsd
is defined as
S(\lambda) = \sqrt{ \frac{1}{v (v - 1)} \sum_{u=1}^v (E_u(\lambda) - \bar{E}(\lambda))^2 }
which is the classic definition of the standard error of the mean.
lambda |
regularization parameter sequence for the full data |
cvm |
mean cross-validation error for each |
cvsd |
estimated standard error of |
cvup |
upper curve: |
cvlo |
lower curve: |
nzero |
number of non-zero groups for each |
grpnet.fit |
fitted grpnet object for the full data |
lambda.min |
value of |
lambda.1se |
largest |
index |
two-element vector giving the indices of |
type.measure |
loss function for cross-validation (used for plot label) |
call |
matched call |
time |
runtime in seconds to perform k-fold CV tuning |
tune |
data frame containing the tuning results, i.e., min(cvm) for each combo of |
When adaptive = TRUE
, the adaptive group elastic net is used:
(1) an initial fit with alpha = 0
estimates the penalty.factor
(2) a second fit using estimated penalty.factor
is returned
lambda.1se
is defined as follows:
minid <- which.min(cvm)
min1se <- cvm[minid] + cvsd[minid]
se1id <- which(cvm <= min1se)[1]
lambda.1se <- lambda[se1id]
Nathaniel E. Helwig <helwig@umn.edu>
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1-22. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v033.i01")}
Helwig, N. E. (2024). Versatile descent algorithms for group regularization and variable selection in generalized linear models. Journal of Computational and Graphical Statistics. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10618600.2024.2362232")}
plot.cv.grpnet
for plotting the cross-validation error curve
predict.cv.grpnet
for predicting from cv.grpnet
objects
grpnet
for fitting group elastic net regularization paths
######***###### family = "gaussian" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = mpg)
set.seed(1)
mod <- cv.grpnet(mpg ~ ., data = auto)
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "binomial" ######***######
# load data
data(auto)
# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")
# 10-fold cv (default method, response = origin with 2 levels)
set.seed(1)
mod <- cv.grpnet(origin ~ ., data = auto, family = "binomial")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "multinomial" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = origin with 3 levels)
set.seed(1)
mod <- cv.grpnet(origin ~ ., data = auto, family = "multinomial")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "poisson" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = horsepower)
set.seed(1)
mod <- cv.grpnet(horsepower ~ ., data = auto, family = "poisson")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "negative.binomial" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = horsepower)
set.seed(1)
mod <- cv.grpnet(horsepower ~ ., data = auto, family = "negative.binomial")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "Gamma" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = origin)
set.seed(1)
mod <- cv.grpnet(mpg ~ ., data = auto, family = "Gamma")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "inverse.gaussian" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = origin)
set.seed(1)
mod <- cv.grpnet(mpg ~ ., data = auto, family = "inverse.gaussian")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.