Lrnr_sl: The Super Learner Algorithm
In tlverse/sl3: Pipelines for Machine Learning and Super Learning

Lrnr_sl

R Documentation

The Super Learner Algorithm

Description

Learner that encapsulates the Super Learner algorithm. Fits metalearner on cross-validated predictions from learners. Then forms a pipeline with the learners.

Format

An R6Class object inheriting from Lrnr_base.

Value

A learner object inheriting from Lrnr_base with methods for training and prediction. For a full list of learner functionality, see the complete documentation of Lrnr_base.

Parameters

learners: The "library" of user-specified algorithms for the super learner to consider as candidates.
metalearner = "default": The metalearner to be fit on c cross-validated predictions from the candidates. If "default", the default_metalearner is used to construct a metalearner based on the outcome_type of the training task.
cv_control = NULL: Optional list of arguments that will be used to define a specific cross-validation fold structure for fitting the super learner. Intended for use in a nested cross-validation scheme, such as cross-validated super learner (cv_sl) or when Lrnr_sl is considered in the list of candidate learners in another Lrnr_sl. Includes the arguments listed below, and any others to be passed to fold_funs:
- strata = NULL: Discrete covariate or outcome name to define stratified cross-validation folds. If NULL and if task$outcome_type$type is binary or categorical, then the default behavior is to consider stratified cross-validation, where the strata are defined with respect to the outcome. To override the default behavior, i.e., to not consider stratified cross-validation when strata = NULL and task$outcome_type$type is binary or categorical is not NULL, set strata = "none".
- cluster_by_id = TRUE: Logical to specify clustered cross-validation scheme according to id in task. Specifically, if task$nodes$id is not NULL and if cluster_by_id = TRUE (default) then task$nodes$id is used to define a clustered cross-validation scheme, so dependent units are placed together in the same training sets and validation set. To override the default behavior, i.e., to not consider clustered cross-validation when task$nodes$id is not NULL, set cluster_by_id = FALSE.
- fold_fun = NULL: A function indicating the origami cross-validation scheme to use, such as folds_vfold for V-fold cross-validation. See fold_funs for a list of possibilities. If NULL (default) and if other cv_control arguments are specified, e.g., V, strata or cluster_by_id, then the default behavior is to set fold_fun = origami::folds_vfold.
- ...: Other arguments to be passed to fold_fun, such as V for fold_fun = folds_vfold. See fold_funs for a list fold-function-specific possible arguments.
keep_extra = TRUE: Stores all sub-parts of the super learner computation. When FALSE, the resulting object has a memory footprint that is significantly reduced through the discarding of intermediary data structures.
verbose = NULL: Whether to print cv_control-related messages. Warnings and errors are always printed. When verbose = NULL, verbosity specified by option sl3.verbose will be used, and the default sl3.verbose option is FALSE. (Note: to turn on sl3.verbose option, set options("sl3.verbose" = TRUE).)
...: Any additional parameters that can be considered by Lrnr_base.

Examples

## Not run: 
data(cpp_imputed)
covs <- c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs")
task <- sl3_Task$new(cpp_imputed, covariates = covs, outcome = "haz")
# this is just for illustrative purposes, not intended for real applications
# of the super learner!
glm_lrn <- Lrnr_glm$new()
ranger_lrn <- Lrnr_ranger$new()
lasso_lrn <- Lrnr_glmnet$new()
eSL <- Lrnr_sl$new(learners = list(glm_lrn, ranger_lrn, lasso_lrn))
eSL_fit <- eSL$train(task)
# example with cv_control, where Lrnr_sl included as a candidate
eSL_nested5folds <- Lrnr_sl$new(
  learners = list(glm_lrn, ranger_lrn, lasso_lrn),
  cv_control = list(V = 5),
  verbose = FALSE
)
dSL <- Lrnr_sl$new(
  learners = list(glm_lrn, ranger_lrn, lasso_lrn, eSL_nested5folds),
  metalearner = Lrnr_cv_selector$new(loss_squared_error)
)
dSL_fit <- dSL$train(task)
# example with cv_control, where we use cross-validated super learner
cvSL_fit <- cv_sl(
  lrnr_sl = eSL_nested5folds, task = task, eval_fun = loss_squared_error
)

## End(Not run)

tlverse/sl3 documentation built on Nov. 18, 2024, 12:46 a.m.