fit_growth.ModelStack: Fit Discrete Growth Curve SuperLearner

Description Usage Arguments Value

View source: R/modelingSL_main.R

Description

Define and fit discrete SuperLearner for growth curve modeling. Model selection (scoring) can be based on either MSE for a single random (or last) holdout data-point for each subject (method = "holdout") or V-fold cross-validated MSE which uses entire subjects (entire growth curves) for model validation (method = "cv").

Usage

1
2
3
4
5
6
7
8
fit_growth(...)

## S3 method for class 'ModelStack'
fit_growth(models, method = c("none", "cv", "holdout",
  "holdout_cv", "SL"), data, ID, t_name, x, y, nfolds = NULL,
  fold_column = NULL, hold_column = NULL, hold_random = FALSE,
  seed = NULL, use_new_features = FALSE, refit = TRUE,
  verbose = getOption("gridisl.verbose"), ...)

Arguments

...

Additional arguments that will be passed on to gridisl::fit_model function.

models

Parameters specifying the model(s) to fit. This must be a result of calling gridisl::defModel(...) + gridisl::defModel(...) functions. See defModel for additional information.

method

The type of model selection procedure when fitting several models. Possible options are "none" (no model selection), "holdout" – model selection based on validation holdout sample; "holdout_cv" – ; "cv" – model selection using V-fold cross-validation; "SL" – perform model stacking (combine all models) with Super Learner using V-fold cross-validation predictions.

data

Input dataset, can be a data.frame or a data.table.

ID

A character string name of the column that contains the unique subject identifiers.

t_name

A character string name of the column with integer-valued measurement time-points (in days, weeks, months, etc).

x

A vector containing the names of predictor variables to use for modeling. If x is missing, then all columns except ID, y are used.

y

A character string name of the column that represent the response variable in the model.

nfolds

Number of folds to use in cross-validation.

fold_column

The name of the column in the input data that contains the cross-validation fold indicators (must be an ordered factor).

hold_column

The name of the column that contains the holdout observation indicators (TRUE/FALSE) in the input data. This holdout column must be defined and added to the input data prior to calling this function.

hold_random

Logical, specifying if the holdout observations should be selected at random. If FALSE then the last observation for each subject is selected as a holdout.

seed

Random number seed for selecting a random holdout.

use_new_features

Set to TRUE to use new features (predictors) defined by the growth curve feature-creator function define_features_drop. Note that the define_features_drop function is called automatically, but the features defined inside this function aren't use unless this is set to TRUE.

refit

Set to TRUE (default) to refit the best estimator using the entire dataset. When FALSE, it might be impossible to make predictions from this model fit.

verbose

Set to TRUE to print messages on status and information to the console. Turn this on by default using options(gridisl.verbose=TRUE).

Value

An R6 object containing the model fit(s).


osofr/growthcurveSL documentation built on May 24, 2019, 4:56 p.m.