View source: R/nestcv_SuperLearner.R
nestcv.SuperLearner | R Documentation |
Provides a single loop of outer cross-validation to evaluate performance of
ensemble models from SuperLearner
package.
nestcv.SuperLearner(
y,
x,
filterFUN = NULL,
filter_options = NULL,
weights = NULL,
balance = NULL,
balance_options = NULL,
modifyX = NULL,
modifyX_useY = FALSE,
modifyX_options = NULL,
outer_method = c("cv", "LOOCV"),
n_outer_folds = 10,
outer_folds = NULL,
parallel_mode = NULL,
cv.cores = 1,
final = TRUE,
na.option = "pass",
verbose = TRUE,
...
)
y |
Response vector |
x |
Dataframe or matrix of predictors. Matrix will be coerced to dataframe as this is the default for SuperLearner. |
filterFUN |
Filter function, e.g. ttest_filter or relieff_filter.
Any function can be provided and is passed |
filter_options |
List of additional arguments passed to the filter
function specified by |
weights |
Weights applied to each sample for models which can use
weights. Note |
balance |
Specifies method for dealing with imbalanced class data.
Current options are |
balance_options |
List of additional arguments passed to the balancing function |
modifyX |
Character string specifying the name of a function to modify
|
modifyX_useY |
Logical value whether the |
modifyX_options |
List of additional arguments passed to the |
outer_method |
String of either |
n_outer_folds |
Number of outer CV folds |
outer_folds |
Optional list containing indices of test folds for outer
CV. If supplied, |
parallel_mode |
Either "mclapply" or "snow". This determines which
parallel backend to use. The default is |
cv.cores |
Number of cores for parallel processing of the outer loops. |
final |
Logical whether to fit final model. |
na.option |
Character value specifying how |
verbose |
Logical whether to print messages and show progress |
... |
Additional arguments passed to |
This performs an outer CV on SuperLearner package ensemble models to measure
performance, allowing balancing of imbalanced datasets as well as filtering
of predictors. SuperLearner prefers dataframes as inputs for the predictors.
If x
is a matrix it will be coerced to a dataframe and variable names
adjusted by make.names()
.
Parallelisation of the outer CV folds is available on linux/mac, but not
available on windows. On windows, snowSuperLearner()
is called instead, so
that parallelisation is performed across each call to SuperLearner.
An object with S3 class "nestcv.SuperLearner"
call |
the matched call |
output |
Predictions on the left-out outer folds |
outer_result |
List object of results from each outer fold containing predictions on left-out outer folds, model result and number of filtered predictors at each fold. |
dimx |
vector of number of observations and number of predictors |
y |
original response vector |
yfinal |
final response vector (post-balancing) |
outer_folds |
List of indices of outer test folds |
final_fit |
Final fitted model on whole data |
final_vars |
Column names of filtered predictors entering final model |
summary_vars |
Summary statistics of filtered predictors |
roc |
ROC AUC for binary classification where available. |
summary |
Overall performance summary. Accuracy and balanced accuracy for classification. ROC AUC for binary classification. RMSE for regression. |
Care should be taken with some SuperLearner
models e.g. SL.gbm
as some
models have multicore enabled by default, which can lead to huge numbers of
processes being spawned.
SuperLearner::SuperLearner()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.