repeatcv: Repeated nested CV

View source: R/repeatcv.R

repeatcvR Documentation

Repeated nested CV

Description

Performs repeated calls to a nestedcv model to determine performance across repeated runs of nested CV.

Usage

repeatcv(
  expr,
  n = 5,
  repeat_folds = NULL,
  keep = TRUE,
  extra = FALSE,
  progress = TRUE,
  rep.cores = 1L
)

Arguments

expr

An expression containing a call to nestcv.glmnet(), nestcv.train(), nestcv.SuperLearner() or outercv().

n

Number of repeats

repeat_folds

Optional list containing fold indices to be applied to the outer CV folds.

keep

Logical whether to save repeated outer CV predictions for ROC curves etc.

extra

Logical whether additional performance metrics are gathered for binary classification models. See metrics().

progress

Logical whether to show progress.

rep.cores

Integer specifying number of cores/threads to invoke.

Details

We recommend using this with the R pipe ⁠|>⁠ (see examples).

When comparing models, it is recommended to fix the sets of outer CV folds used across each repeat for comparing performance between models. The function repeatfolds() can be used to create a fixed set of outer CV folds for each repeat.

Parallelisation over repeats is performed using parallel::mclapply (not available on windows). Beware that cv.cores can still be set within calls to nestedcv models (= nested parallelisation). This means that rep.cores x cv.cores number of processes/forks will be spawned, so be careful not to overload your CPU. In general parallelisation of repeats using rep.cores is faster than parallelisation using cv.cores.

Value

List of S3 class 'repeatcv' containing:

call

the model call

result

matrix of performance metrics

output

(if keep = TRUE) a matrix or dataframe containing the outer CV predictions from each repeat

roc

(binary classification models only) a ROC curve object based on predictions across all repeats as returned in output, generated by pROC::roc()

Examples


data("iris")
dat <- iris
y <- dat$Species
x <- dat[, 1:4]

res <- nestcv.glmnet(y, x, family = "multinomial", alphaSet = 1,
                     n_outer_folds = 4) |>
       repeatcv(3, rep.cores = 2)
res
summary(res)

## set up fixed fold indices
set.seed(123, "L'Ecuyer-CMRG")
folds <- repeatfolds(y, repeats = 3, n_outer_folds = 4)
res <- nestcv.glmnet(y, x, family = "multinomial", alphaSet = 1,
                     n_outer_folds = 4) |>
       repeatcv(3, repeat_folds = folds, rep.cores = 2)
res


nestedcv documentation built on June 22, 2024, 11:30 a.m.