fit.ModelStack: Fit Discrete SuperLearner

Description Usage Arguments Value

Description

Define and fit discrete SuperLearner for longitudinal data. Model selection (scoring) can be based on MSE evaluated for random holdout observations (method = "holdout") or V-fold cross-validated MSE (method = "cv").

Usage

1
2
3
4
5
6
7
8
fit(...)

## S3 method for class 'ModelStack'
fit(models, method = c("none", "holdout", "cv",
  "origamiSL", "internalSL"), data, ID, t_name, x, y, nfolds = NULL,
  fold_column = NULL, hold_column = NULL, hold_random = FALSE,
  seed = NULL, refit = TRUE, fold_y_names = NULL,
  verbose = getOption("gridisl.verbose"), ...)

Arguments

...

Additional arguments that will be passed on directly to fit_model function.

models

Parameters specifying the model(s) to fit. This must be a result of calling defModel(...) + defModel(...) functions. See defModel for additional information.

method

The type of model selection and model stacking procedure when fitting more than one model. Possible options are: "none" – no model selection; "holdout" – model selection based on a (possibly) random holdout validation sample; "cv" – discrete Super Learner, select a single best-performing model via internal V-fold cross-validation; "origamiSL" – convex (NNLS) Super Learner with external V-fold cross-validation (using origami R package); "internalSL" – convex (NNLS) Super Learner with internal V-fold cross-validation (same CV as in method="cv");

data

Input dataset, can be a data.frame or a data.table.

ID

A character string name of the column that contains the unique subject identifiers.

t_name

A character string name of the column with integer-valued measurement time-points (in days, weeks, months, etc).

x

A vector containing the names of predictor variables to use for modeling. If x is missing, then all columns except ID, y are used.

y

A character string name of the column that represent the response variable in the model.

nfolds

Number of folds to use in cross-validation.

fold_column

The name of the column in the input data that contains the cross-validation fold indicators (must be an ordered factor).

hold_column

The name of the column that contains the holdout observation indicators (TRUE/FALSE) in the input data. This holdout column must be defined and added to the input data prior to calling this function.

hold_random

Logical, specifying if the holdout observations should be selected at random. If FALSE then the last observation for each subject is selected as a holdout.

seed

Random number seed for selecting random holdouts or validation folds.

refit

Set to TRUE (default) to refit the best estimator using the entire dataset. When FALSE, it might be impossible to make predictions from this model fit.

fold_y_names

(ADVANCED FEATURE) The names of columns in data containing the fold-specific outcomes. Can be used for contructing split-specific (or by-fold) Super-Learner with method="origamiSL".

verbose

Set to TRUE to print messages on status and information to the console. Turn this on by default using options(gridisl.verbose=TRUE).

Value

An R6 object containing the model fit(s).


osofr/longGriDiSL documentation built on May 24, 2019, 4:56 p.m.