nestcv.SuperLearner: Outer cross-validation of SuperLearner model

View source: R/nestcv_SuperLearner.R

nestcv.SuperLearnerR Documentation

Outer cross-validation of SuperLearner model

Description

Provides a single loop of outer cross-validation to evaluate performance of ensemble models from SuperLearner package.

Usage

nestcv.SuperLearner(
  y,
  x,
  filterFUN = NULL,
  filter_options = NULL,
  weights = NULL,
  balance = NULL,
  balance_options = NULL,
  outer_method = c("cv", "LOOCV"),
  n_outer_folds = 10,
  outer_folds = NULL,
  cv.cores = 1,
  na.option = "pass",
  ...
)

Arguments

y

Response vector

x

Dataframe or matrix of predictors. Matrix will be coerced to dataframe as this is the default for SuperLearner.

filterFUN

Filter function, e.g. ttest_filter or relieff_filter. Any function can be provided and is passed y and x. Must return a character vector with names of filtered predictors. Not available if outercv is called with a formula.

filter_options

List of additional arguments passed to the filter function specified by filterFUN.

weights

Weights applied to each sample for models which can use weights. Note weights and balance cannot be used at the same time. Weights are not applied in filters.

balance

Specifies method for dealing with imbalanced class data. Current options are "randomsample" or "smote". Not available if outercv is called with a formula. See randomsample() and smote()

balance_options

List of additional arguments passed to the balancing function

outer_method

String of either "cv" or "LOOCV" specifying whether to do k-fold CV or leave one out CV (LOOCV) for the outer folds

n_outer_folds

Number of outer CV folds

outer_folds

Optional list containing indices of test folds for outer CV. If supplied, n_outer_folds is ignored.

cv.cores

Number of cores for parallel processing of the outer loops. NOTE: this uses parallel::mclapply on unix/mac and parallel::parLapply on windows.

na.option

Character value specifying how NAs are dealt with. "omit" is equivalent to na.action = na.omit. "omitcol" removes cases if there are NA in 'y', but columns (predictors) containing NA are removed from 'x' to preserve cases. Any other value means that NA are ignored (a message is given).

...

Additional arguments passed to SuperLearner::SuperLearner()

Details

This performs an outer CV on SuperLearner package ensemble models to measure performance, allowing balancing of imbalanced datasets as well as filtering of predictors. SuperLearner prefers dataframes as inputs for the predictors. If x is a matrix it will be coerced to a dataframe and variable names adjusted by make.names().

Value

An object with S3 class "nestcv.SuperLearner"

call

the matched call

output

Predictions on the left-out outer folds

outer_result

List object of results from each outer fold containing predictions on left-out outer folds, model result and number of filtered predictors at each fold.

dimx

vector of number of observations and number of predictors

y

original response vector

yfinal

final response vector (post-balancing)

outer_folds

List of indices of outer test folds

final_fit

Final fitted model on whole data

final_vars

Column names of filtered predictors entering final model

summary_vars

Summary statistics of filtered predictors

roc

ROC AUC for binary classification where available.

summary

Overall performance summary. Accuracy and balanced accuracy for classification. ROC AUC for binary classification. RMSE for regression.

Note

Care should be taken with some SuperLearner models e.g. SL.gbm as some models have multicore enabled by default, which can lead to huge numbers of processes being spawned.

See Also

SuperLearner::SuperLearner()


nestedcv documentation built on Dec. 5, 2022, 5:25 p.m.