define_features_drop: Define curve-based features and summaries for training or...

Description Usage Arguments Value

View source: R/growthcurve_features.R

Description

Defines additional features and summaries of the growth curve person-time observations. Used for modeling and defining the training and validation sets (e.g., random holdout and cross-validation). By setting train_set to TRUE this function will define features using all data points as a full training set (no holdouts, summaries use all person-time rows). In contrast, when train_set = TRUE and hold_column is not missing, these features are defined only for non-holdout observations, excluding the holdout rows (i.e., curve summaries will be defined based on training points only while dropping all holdout observations). Finally, by setting train_set to FALSE one can create a validation dataset (e.g., for scoring with CV). In this case the summaries and features will be defined for each row data point (X_i,Y_i) by first dropping (X_i,Y_i) and then evaluating the summaries for (X_i,Y_i) based on the remaining observations. This process is repeated in a loop for all person-time rows in the data.

Usage

1
2
3
define_features_drop(dataDT, ID, t_name, y, train_set = TRUE, hold_column,
  noNAs = FALSE, includeRLMIDind = FALSE,
  verbose = getOption("growthcurveSL.verbose"))

Arguments

dataDT

Input data.table

ID

A character string name of the column that contains the unique subject identifiers.

t_name

A character string name of the column with integer-valued measurement time-points (in days, weeks, months, etc).

y

A character string name of the column that represent the response variable in the model.

train_set

Set to TRUE to define growth curve features and summaries for training data. Set to FALSE to define the summaries for validation data. In the latter case the summaries are defined for observation (X_i,Y_i) by first dropping that observation and then evaluating the summaries for the remaining observations. This is repeated in a loop for all person-time rows in the data.

hold_column

A column with a logical flag for holdout rows / observations (TRUE indicates that the row is a holdout). When train_set is TRUE the resulting output data will contain all non-HOLDOUT observations (training data points). When train_set is FALSE the resulting output data will contain the HOLDOUT observations only (validation data points). To evaluate either training or validation data summaries FOR ALL observations this argument must be missing (in which case all observation from the input data are returned with their corresponding summaries).

noNAs

...

includeRLMIDind

...

verbose

Set to TRUE to print messages on status and information to the console. Turn this on by default using options(growthcurveSL.verbose=TRUE).

Value

...


osofr/growthcurveSL documentation built on May 24, 2019, 4:56 p.m.