cv.grpnet | R Documentation |

Implements k-fold cross-validation for `grpnet`

to find the regularization parameters that minimize the prediction error (deviance, mean squared error, mean absolute error, or misclassification rate).

```
cv.grpnet(x, ...)
## Default S3 method:
cv.grpnet(x,
y,
group,
weights = NULL,
offset = NULL,
alpha = c(0.01, 0.25, 0.5, 0.75, 1),
gamma = c(3, 4, 5),
type.measure = NULL,
nfolds = 10,
foldid = NULL,
same.lambda = FALSE,
parallel = FALSE,
cluster = NULL,
verbose = interactive(),
adaptive = FALSE,
power = 1,
...)
## S3 method for class 'formula'
cv.grpnet(formula,
data,
use.rk = TRUE,
weights = NULL,
offset = NULL,
alpha = c(0.01, 0.25, 0.5, 0.75, 1),
gamma = c(3, 4, 5),
type.measure = NULL,
nfolds = 10,
foldid = NULL,
same.lambda = FALSE,
parallel = FALSE,
cluster = NULL,
verbose = interactive(),
adaptive = FALSE,
power = 1,
...)
```

`x` |
Model (design) matrix of dimension |

`y` |
Response vector of length |

`group` |
Group label vector (factor, character, or integer) of length |

`formula` |
Model formula: a symbolic description of the model to be fitted. Uses the same syntax as |

`data` |
Optional data frame containing the variables referenced in |

`use.rk` |
If |

`weights` |
Optional vector of length |

`offset` |
Optional vector of length |

`alpha` |
Scalar or vector specifying the elastic net tuning parameter |

`gamma` |
Scalar or vector specifying the penalty hyperparameter |

`type.measure` |
Loss function for cross-validation. Options include: |

`nfolds` |
Number of folds for cross-validation. |

`foldid` |
Optional vector of length |

`same.lambda` |
Logical specfying if the same |

`parallel` |
Logical specifying if sequential computing (default) or parallel computing should be used. If |

`cluster` |
Optional cluster to use for parallel computing. If |

`verbose` |
Logical indicating if the fitting progress should be printed. Defaults to |

`adaptive` |
Logical indicating if the adaptive group elastic net should be used (see Note). |

`power` |
If |

`...` |
Optional additional arguments for |

This function calls the `grpnet`

function `nfolds+1`

times: once on the full dataset to obtain the `lambda`

sequence, and once holding out each fold's data to evaluate the prediction error. The syntax of (the default S3 method for) this function closely mimics that of the `cv.glmnet`

function in the **glmnet** package (Friedman, Hastie, & Tibshirani, 2010).

Let `\mathbf{D}_u = \{\mathbf{y}_u, \mathbf{X}_u\}`

denote the `u`

-th fold's data, let `\mathbf{D}_{[u]} = \{\mathbf{y}_{[u]}, \mathbf{X}_{[u]}\}`

denote the full dataset excluding the `u`

-th fold's data, and let `\boldsymbol\beta_{\lambda [u]}`

denote the coefficient estimates obtained from fitting the model to `\mathbf{D}_{[u]}`

using the regularization parameter `\lambda`

.

The cross-validation error for the `u`

-th fold is defined as

`E_u(\lambda) = C(\boldsymbol\beta_{\lambda [u]} , \mathbf{D}_u)`

where `C(\cdot , \cdot)`

denotes the cross-validation loss function that is specified by `type.measure`

. For example, the `"mse"`

loss function is defined as

`C(\boldsymbol\beta_{\lambda [u]} , \mathbf{D}_u) = \| \mathbf{y}_u - \mathbf{X}_u \boldsymbol\beta_{\lambda [u]} \|^2`

where `\| \cdot \|`

denotes the L2 norm.

The mean cross-validation error `cvm`

is defined as

`\bar{E}(\lambda) = \frac{1}{v} \sum_{u = 1}^v E_u(\lambda) `

where `v`

is the total number of folds. The standard error `cvsd`

is defined as

`S(\lambda) = \sqrt{ \frac{1}{v (v - 1)} \sum_{u=1}^v (E_u(\lambda) - \bar{E}(\lambda))^2 } `

which is the classic definition of the standard error of the mean.

`lambda` |
regularization parameter sequence for the full data |

`cvm` |
mean cross-validation error for each |

`cvsd` |
estimated standard error of |

`cvup` |
upper curve: |

`cvlo` |
lower curve: |

`nzero` |
number of non-zero groups for each |

`grpnet.fit` |
fitted grpnet object for the full data |

`lambda.min` |
value of |

`lambda.1se` |
largest |

`index` |
two-element vector giving the indices of |

`type.measure` |
loss function for cross-validation (used for plot label) |

`call` |
matched call |

`time` |
runtime in seconds to perform k-fold CV tuning |

`tune` |
data frame containing the tuning results, i.e., min(cvm) for each combo of |

When `adaptive = TRUE`

, the adaptive group elastic net is used:

(1) an initial fit with `alpha = 0`

estimates the `penalty.factor`

(2) a second fit using estimated `penalty.factor`

is returned

`lambda.1se`

is defined as follows:

`minid <- which.min(cvm)`

`min1se <- cvm[minid] + cvsd[minid]`

`se1id <- which(cvm <= min1se)[1]`

`lambda.1se <- lambda[se1id]`

Nathaniel E. Helwig <helwig@umn.edu>

Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. *Journal of Statistical Software, 33*(1), 1-22. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v033.i01")}

Helwig, N. E. (2024). Versatile descent algorithms for group regularization and variable selection in generalized linear models. *Journal of Computational and Graphical Statistics*. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10618600.2024.2362232")}

`plot.cv.grpnet`

for plotting the cross-validation error curve

`predict.cv.grpnet`

for predicting from `cv.grpnet`

objects

`grpnet`

for fitting group elastic net regularization paths

```
######***###### family = "gaussian" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = mpg)
set.seed(1)
mod <- cv.grpnet(mpg ~ ., data = auto)
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "binomial" ######***######
# load data
data(auto)
# redefine origin (Domestic vs Foreign)
auto$origin <- ifelse(auto$origin == "American", "Domestic", "Foreign")
# 10-fold cv (default method, response = origin with 2 levels)
set.seed(1)
mod <- cv.grpnet(origin ~ ., data = auto, family = "binomial")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "multinomial" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = origin with 3 levels)
set.seed(1)
mod <- cv.grpnet(origin ~ ., data = auto, family = "multinomial")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "poisson" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = horsepower)
set.seed(1)
mod <- cv.grpnet(horsepower ~ ., data = auto, family = "poisson")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "negative.binomial" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = horsepower)
set.seed(1)
mod <- cv.grpnet(horsepower ~ ., data = auto, family = "negative.binomial")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "Gamma" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = origin)
set.seed(1)
mod <- cv.grpnet(mpg ~ ., data = auto, family = "Gamma")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
######***###### family = "inverse.gaussian" ######***######
# load data
data(auto)
# 10-fold cv (formula method, response = origin)
set.seed(1)
mod <- cv.grpnet(mpg ~ ., data = auto, family = "inverse.gaussian")
# print min and 1se solution info
mod
# plot cv error curve
plot(mod)
```

