repeatcv: Repeated nested CV

View source: R/repeatcv.R

repeatcvR Documentation

Repeated nested CV

Description

Performs repeated calls to a nestedcv model to determine performance across repeated runs of nested CV.

Usage

repeatcv(
  expr,
  n = 5,
  repeat_folds = NULL,
  keep = FALSE,
  extra = FALSE,
  progress = TRUE,
  rep_parallel = "mclapply",
  rep.cores = 1L
)

Arguments

expr

An expression containing a call to nestcv.glmnet(), nestcv.train(), nestcv.SuperLearner() or outercv().

n

Number of repeats

repeat_folds

Optional list containing fold indices to be applied to the outer CV folds.

keep

Logical whether to save repeated outer CV fitted models for variable importance, SHAP etc. Note this can make the resulting object very large.

extra

Logical whether additional performance metrics are gathered for binary classification models. See metrics().

progress

Logical whether to show progress.

rep_parallel

Either "mclapply" or "future". This determines which parallel backend to use.

rep.cores

Integer specifying number of cores/threads to invoke. Ignored if rep_parallel = "future".

Details

We recommend using this with the R pipe ⁠|>⁠ (see examples).

When comparing models, it is recommended to fix the sets of outer CV folds used across each repeat for comparing performance between models. The function repeatfolds() can be used to create a fixed set of outer CV folds for each repeat.

Parallelisation over repeats is performed using parallel::mclapply (not available on windows) or future depending on how rep_parallel is set. Beware that cv.cores can still be set within calls to nestedcv models (= nested parallelisation). This means that rep.cores x cv.cores number of processes/forks will be spawned, so be careful not to overload your CPU. In general parallelisation of repeats using rep.cores is faster than parallelisation using cv.cores. rep.cores is ignored if you are using future. Set the number of workers for future using future::plan().

Value

List of S3 class 'repeatcv' containing:

call

the model call

result

matrix of performance metrics

output

a matrix or dataframe containing the outer CV predictions from each repeat

roc

(binary classification models only) a ROC curve object based on predictions across all repeats as returned in output, generated by pROC::roc()

fits

(if keep = TRUE) list of length n containing slimmed 'nestedcv' model objects for calculating variable importance or SHAP values

Examples


data("iris")
dat <- iris
y <- dat$Species
x <- dat[, 1:4]

res <- nestcv.glmnet(y, x, family = "multinomial", alphaSet = 1,
                     n_outer_folds = 4) |>
       repeatcv(3, rep.cores = 2)
res
summary(res)

## set up fixed fold indices
set.seed(123, "L'Ecuyer-CMRG")
folds <- repeatfolds(y, repeats = 3, n_outer_folds = 4)
res <- nestcv.glmnet(y, x, family = "multinomial", alphaSet = 1,
                     n_outer_folds = 4) |>
       repeatcv(3, repeat_folds = folds, rep.cores = 2)
res


nestedcv documentation built on April 4, 2025, 2:21 a.m.