setCohort: Definition of cohort dataset
In romainkp/LtAtStructuR: Structuring of Complex Longitudinal Data into Long Format

View source: R/dataConstruction_workflow.R

setCohort

R Documentation

Definition of cohort dataset

Description

The cohort dataset specifies for each subject in the cohort:

a unique subject identifier,
the date of study entry,
the date of end of follow-up,
the reason for end of follow-up (failure or right-censoring), and
baseline measurements of time-dependent or time-independent covariates.

Usage

setCohort(
  data,
  IDvar,
  index_date,
  EOF_date,
  EOF_type,
  Y_name,
  L0,
  L0_timeIndep = NA
)

Arguments

`data`	`data.table` containing the input cohort dataset to be wrapped in and processed. The table must contain a single row for each subject in the cohort. Cannot have columns named 'IDvar', 'index_date', 'EOF_date', 'EOF_type', or 'L0'.
`IDvar`	`character` providing the name of the column of `data` that contains the unique subject identifier.
`index_date`	`character` providing the name of the column of `data` that contains the date of study entry.
`EOF_date`	`character` providing the name of the column of `data` that contains the date of end of follow-up. All observations with the end of follow-up date equal to the study entry date will be ignored (i.e., excluded from the cohort).
`EOF_type`	`character` providing the name of the column of `data` corresponding to the reason for end of follow-up.
`Y_name`	`character` or `integer` providing the unique value in column `EOF_type` that encodes the end of follow-up due to failure (i.e., occurrence of the outcome event of interest).
`L0`	vector of `character` providing the names of the columns of `data` that contain baseline covariate measurements.
`L0_timeIndep`	named list specifying, for each time-independent covariates in `L0`, a sublist with only the following three named elements: `categorical`: specifies whether the covariate is continuous ('FALSE') or categorical ('TRUE'). Cannot be missing. `impute`: specifies the imputation method for missing measurements: 'default', 'mean', 'mode', 'median'. If missing, imputation with the 'mean' and 'mode' is used for continuous and categorical covariates, respectively. Imputation with 'mean', 'mode', or 'median' is based on measurements from subjects with observed covariate values in `data`. 'mean' and 'median' can only be used for continuous covariates. 'mode' can only be used for categorical covariates. Imputation with 'default' replaces missing values with 0 if the covariate is numeric and with 'Unknown' otherwise. `impute_default_level` imputation value to be used when the imputation method is 'default'. The value must be a length 1 `character` (resp. `numeric`) for a covariate encoded by a `character` (resp. `numeric`) vector. If missing, the default values 0 and 'Unknown' are used for continuous and categorical covariates, respectively. Each element of the list `L0_timeIndep` must be named with the time-independent covariate in `L0` to which the sublist information applies.

Value

cohortData object

Examples

cohort <- setCohort(cohortDT, "ID", "IndexDate", "EOFDate", "EOFtype",
                    "AMI", c("ageEntry", "sex", "race", "A1c", "eGFR"),
                    list("ageEntry"=list("categorical"=FALSE,
                                         "impute"=NA,
                                         "impute_default_level"=NA),
                         "sex"=list("categorical"=TRUE,
                                    "impute"=NA,
                                    "impute_default_level"=NA),
                         "race"=list("categorical"=TRUE,
                                     "impute"=NA,
                                     "impute_default_level"=NA)) )

romainkp/LtAtStructuR documentation built on Aug. 24, 2024, 3:38 p.m.