cohortData: Data Storage Class for cohort dataset

cohortDataR Documentation

Data Storage Class for cohort dataset

Description

Class that defines the standard format for the cohort dataset, i.e. the table that specifies for each subject in the cohort:

  1. a unique subject identifier,

  2. the date of study entry,

  3. the date of end of follow-up,

  4. the reason for end of follow-up (failure or right-censoring), and

  5. baseline measurements of time-dependent or time-independent covariates.

Format

R6Class object.

Value

cohortData object

Fields

data:

data.table containing the input cohort dataset to be wrapped in and processed. The table must contain a single row for each subject in the cohort. Cannot have columns named 'IDvar', 'index_date', 'EOF_date', 'EOF_type', or 'L0'.

IDvar:

character providing the name of the column of data that contains the unique subject identifier.

index_date:

character providing the name of the column of data that contains the date of study entry.

EOF_date:

character providing the name of the column of data that contains the date of end of follow-up. All observations with the end of follow-up date equal to the study entry date will be ignored (i.e., excluded from the cohort).

EOF_type:

character providing the name of the column of data corresponding to the reason for end of follow-up.

Y_name:

character or integer providing the unique value in column EOF_type that encodes the end of follow-up due to failure (i.e., occurrence of the outcome event of interest).

L0:

vector of character providing the names of the columns of data that contain baseline covariate measurements. Covariate values must be encoded by a character or numeric vector (e.g., factors are not allowed).

L0_timeIndep:

named list specifying, for each time-independent covariates in L0, a sublist with only the following three named elements:

  1. categorical: logical indicating whether the time-independent covariate is continuous ('FALSE') or categorical ('TRUE'). Cannot be missing.

  2. impute: character specifying the imputation method for missing measurements of the time-independent covariate. Possible values are 'default', 'mean', 'mode', 'median'. If missing, imputation with the 'mean' and 'mode' is used for continuous and categorical covariates, respectively. Imputation with 'mean', 'mode', or 'median' is based on measurements in data from subjects with observed covariate values. 'mean' and 'median' can only be used for continuous covariates. 'mode' can only be used for categorical covariates. Imputation with 'default' replaces missing values with 0 if the covariate is numeric and with 'Unknown' otherwise.

  3. impute_default_level: character or numeric specifying the imputation value to be used when impute='default'. The value must be a length 1 character (resp. numeric) for a covariate encoded by a character (resp. numeric) vector. If missing, the default values 0 and 'Unknown' are used for numeric and character covariates, respectively.

Each element of the list L0_timeIndep must be named with the time-independent covariate in L0 to which the sublist information applies. L0_timeIndep can be missing if there is no time-independent covariate in data.

Methods

Public methods


Method new()

Usage
cohortData$new(
  data,
  IDvar,
  index_date,
  EOF_date,
  EOF_type,
  Y_name,
  L0,
  L0_timeIndep
)

Method clone()

The objects of this class are cloneable with this method.

Usage
cohortData$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.


romainkp/LtAtStructuR documentation built on Aug. 24, 2024, 3:38 p.m.