setCohort: Definition of cohort dataset

View source: R/dataConstruction_workflow.R

setCohortR Documentation

Definition of cohort dataset

Description

The cohort dataset specifies for each subject in the cohort:

  1. a unique subject identifier,

  2. the date of study entry,

  3. the date of end of follow-up,

  4. the reason for end of follow-up (failure or right-censoring), and

  5. baseline measurements of time-dependent or time-independent covariates.

Usage

setCohort(
  data,
  IDvar,
  index_date,
  EOF_date,
  EOF_type,
  Y_name,
  L0,
  L0_timeIndep = NA
)

Arguments

data

data.table containing the input cohort dataset to be wrapped in and processed. The table must contain a single row for each subject in the cohort. Cannot have columns named 'IDvar', 'index_date', 'EOF_date', 'EOF_type', or 'L0'.

IDvar

character providing the name of the column of data that contains the unique subject identifier.

index_date

character providing the name of the column of data that contains the date of study entry.

EOF_date

character providing the name of the column of data that contains the date of end of follow-up. All observations with the end of follow-up date equal to the study entry date will be ignored (i.e., excluded from the cohort).

EOF_type

character providing the name of the column of data corresponding to the reason for end of follow-up.

Y_name

character or integer providing the unique value in column EOF_type that encodes the end of follow-up due to failure (i.e., occurrence of the outcome event of interest).

L0

vector of character providing the names of the columns of data that contain baseline covariate measurements.

L0_timeIndep

named list specifying, for each time-independent covariates in L0, a sublist with only the following three named elements:

  1. categorical: specifies whether the covariate is continuous ('FALSE') or categorical ('TRUE'). Cannot be missing.

  2. impute: specifies the imputation method for missing measurements: 'default', 'mean', 'mode', 'median'. If missing, imputation with the 'mean' and 'mode' is used for continuous and categorical covariates, respectively. Imputation with 'mean', 'mode', or 'median' is based on measurements from subjects with observed covariate values in data. 'mean' and 'median' can only be used for continuous covariates. 'mode' can only be used for categorical covariates. Imputation with 'default' replaces missing values with 0 if the covariate is numeric and with 'Unknown' otherwise.

  3. impute_default_level imputation value to be used when the imputation method is 'default'. The value must be a length 1 character (resp. numeric) for a covariate encoded by a character (resp. numeric) vector. If missing, the default values 0 and 'Unknown' are used for continuous and categorical covariates, respectively.

Each element of the list L0_timeIndep must be named with the time-independent covariate in L0 to which the sublist information applies.

Value

cohortData object

See Also

cohortData

Examples

cohort <- setCohort(cohortDT, "ID", "IndexDate", "EOFDate", "EOFtype",
                    "AMI", c("ageEntry", "sex", "race", "A1c", "eGFR"),
                    list("ageEntry"=list("categorical"=FALSE,
                                         "impute"=NA,
                                         "impute_default_level"=NA),
                         "sex"=list("categorical"=TRUE,
                                    "impute"=NA,
                                    "impute_default_level"=NA),
                         "race"=list("categorical"=TRUE,
                                     "impute"=NA,
                                     "impute_default_level"=NA)) ) 

romainkp/LtAtStructuR documentation built on Aug. 24, 2024, 3:38 p.m.