define_features_drop: Define curve-based features and summaries for training or...
In osofr/growthcurveSL: SuperLearner for Imputing Growth Curves

Defines additional features and summaries of the growth curve person-time observations. Used for modeling and defining the training and validation sets (e.g., random holdout and cross-validation). By setting train_set to TRUE this function will define features using all data points as a full training set (no holdouts, summaries use all person-time rows). In contrast, when train_set = TRUE and hold_column is not missing, these features are defined only for non-holdout observations, excluding the holdout rows (i.e., curve summaries will be defined based on training points only while dropping all holdout observations). Finally, by setting train_set to FALSE one can create a validation dataset (e.g., for scoring with CV). In this case the summaries and features will be defined for each row data point (X_i,Y_i) by first dropping (X_i,Y_i) and then evaluating the summaries for (X_i,Y_i) based on the remaining observations. This process is repeated in a loop for all person-time rows in the data.

1
2
3

define_features_drop(dataDT, ID, t_name, y, train_set = TRUE, hold_column,
  noNAs = FALSE, includeRLMIDind = FALSE,
  verbose = getOption("growthcurveSL.verbose"))

`dataDT`	Input data.table
`ID`	A character string name of the column that contains the unique subject identifiers.
`t_name`	A character string name of the column with integer-valued measurement time-points (in days, weeks, months, etc).
`y`	A character string name of the column that represent the response variable in the model.
`train_set`	Set to `TRUE` to define growth curve features and summaries for training data. Set to `FALSE` to define the summaries for validation data. In the latter case the summaries are defined for observation (X_i,Y_i) by first dropping that observation and then evaluating the summaries for the remaining observations. This is repeated in a loop for all person-time rows in the data.
`hold_column`	A column with a logical flag for holdout rows / observations (`TRUE` indicates that the row is a holdout). When `train_set` is `TRUE` the resulting output data will contain all non-HOLDOUT observations (training data points). When `train_set` is `FALSE` the resulting output data will contain the HOLDOUT observations only (validation data points). To evaluate either training or validation data summaries FOR ALL observations this argument must be missing (in which case all observation from the input data are returned with their corresponding summaries).
`noNAs`	...
`includeRLMIDind`	...
`verbose`	Set to `TRUE` to print messages on status and information to the console. Turn this on by default using `options(growthcurveSL.verbose=TRUE)`.