```{css, echo=FALSE} body .main-container { max-width: 1280px !important; width: 1280px !important; } body { max-width: 1280px !important; }

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(data.table)

case_data0 <- data.frame(
  id      = 1:3,
  content = c("intnum=t", 
              "2021-04-21",
              "2021-05-20"),
  start   = c("2021-04-21", 
              "2021-04-21",
              "2021-05-20"),
  end     = c("2021-05-20",
              NA,
              NA),
  group   = c(1,3,4),
  type = c("background","box","box"),
  style = rep("background-color: #C0C0C0; font-size: 8pt",3)
)

1. Introduction

The R package LtAtStructuR automates the process of transforming longitudinal data (e.g. electronic health records data) into a structured analytic data set usable for marginal structural modeling (MSM). This data set, which is generated as output from the package, is suitable for the evaluation of the effects of exposure regimens (e.g. treatment plans) on a survival outcome based on MSM with inverse probability weighting or targeted minimum loss-based estimation. In order to create the output data set, an input cohort data set, an input exposure data set, and an optional input covariate data set(s) must be defined using setCohort(), setExposure(), and setCovariate(), respectively, and gathered into a single LtAtData object. This LtAtData object is then passed into construct() to create the final output data set.

The functionality below is designed to evaluate the effects defined by a single categorical variable at each time point.

2. Input data sets to be created by user

The creation of the following input data sets will often require substantial cleaning and manipulation of 'raw' data in studies based on electronic health records (EHR). For instance, when using drug dispensing data from the EHR, the preparation of an input data set that encodes a particular drug exposure will require the user to map fill dates and quantities dispensed into exposure periods that implement rules regarding, for example, the definition of gaps in medication possession and use of potential medication stockpile. Additionally, the input data sets must be of the class data.table

2.1 Cohort definition

The input cohort data set encodes:

1) baseline measurements 2) dates of cohort entry and end of follow-up (eof), and 3) the reason for eof (i.e., occurrence of the outcome of interest or of a censoring event) for all subjects in the cohort

This data set should contain one row per subject and must include the following columns/variables (all other variables will be ignored):

Missing values must be coded with NA. Note that data from rows with missing values in any of the first 4 colums above (ID,index/eof dates or reason of eof) will be ignored/discarded by the LtAtStructuR package. The LtAtStructuR package, however, will not ignore data from rows with missing values for any of the covariates measurements.

library(LtAtStructuR)
`%+%` <- function(a, b) paste0(a, b)
input.cohort <- data.table::data.table(ID=c("000"%+%1:5),
                                       Index_date=lubridate::mdy(c("10/06/2008","05/18/2005","03/21/2006","06/17/2007","01/28/2008")),
                                       eof_dt=lubridate::mdy(c("09/30/2009","03/16/2007","11/30/2010","12/31/2010","12/31/2010")),
                                       EOF_reason=c("Lost_followup","Outcome","Lost_followup","Study_end","Study_end"),
                                       Race=c("BA","WH",NA,"AS","BA"),
                                       Hypertension=c(0,0,0,1,1),
                                       eGFR=as.numeric(c(NA,"48.2","61.0","59.7","71.3")),
                                       Stroke=c(1,1,0,1,0),
                                       Hosp_stay=c(0,1,1,0,1))
knitr::kable(input.cohort, caption = "Input cohort data")

Note that the output data set generated by the LtAtStructuR package will contain data from all subjects in the cohort data set, i.e., no systematic exclusion criteria are applied by the package Thus, if the inclusion of some subjects is not warranted to address the research question (e.g., subjects who previously experienced the exposure before the index date), then data from these subjects should be excluded from the cohort data set and all subsequent input data sets.

Creating the cohort LtAtData object

When creating the cohort LtAtData object, the user must specify the following arguments:

Define cohort object:

cohort <- setCohort(data = input.cohort, 
                    IDvar = "ID", 
                    index_date = "Index_date", 
                    EOF_date = "eof_dt", 
                    EOF_type = "EOF_reason", 
                    Y_name = "Outcome", 
                    L0 = c("Race","Hypertension","eGFR","Stroke","Hosp_stay"), 
                    L0_timeIndep = list("Race"=list("categorical"=TRUE,
                                                    "impute"=NA,
                                                    "impute_default_level"=NA)) )

2.2 Exposure definition

Two types of exposure data can be handled by the package: interval and instantaneous exposures. Interval exposures corresponds to exposures that are experienced under intervals of time with a start and end date. The instantaneous exposures corresponds to exposures that occur on a single day.

2.2.1 Interval exposures

The input exposure data set encodes the exposure regimens for all subjects in the cohort by describing intervals of time during which subjects are exposed to an exposure level other than the reference exposure level chosen by the user (i.e., the output data set created by the package will be based on the assumption that each patients is exposed to the reference level except if encoded otherwise by the exposure data set). Thus, if a subject only experiences the reference exposure level during follow-up, there should be no record for this subject in this exposure data set. Otherwise, this data set can contain multiple rows for the subject and must include the following four (and sometimes only the first three) variables (all others are ignored):

The exposure episodes described by rows with the same ID must be non-overlapping. Missing values are not allowed in the exposure data set. All subject identifiers in the exposure data set must also be present in the cohort data set. In addition, while the exposure data set may contain measurements collected strictly before a subject's index date or strictly after a subject's end of follow-up date (both dates are specified in the cohort data set), all exposure measurements collected strictly before the index date or strictly after the eof date will be ignored/discarded by the package, i.e. the output data set created by the package will not incorporate these observations. The value that encodes the reference exposure level used in the output data set will be set to 0 if the fourth column of the input exposure data set described above is missing and, otherwise, it will be set to the specified value of the non-reference exosure level.

input.exposure <- data.table::data.table(ID=c("0001","0001","0001","0003","0003"),
                                         Exposure_start=lubridate::mdy(c("07/21/2006","01/30/2009","08/14/2009","04/06/2006","03/30/2008")),
                                         Exposure_end=lubridate::mdy(c("11/30/2008","05/16/2009","10/13/2009","02/17/2008","06/18/2010")))

input.exposure.cat <- data.table::data.table(ID=c("0001","0001","0001","0003","0003"),
                                             Exposure_start=lubridate::mdy(c("07/21/2006","01/30/2009","08/14/2009","04/06/2006","03/30/2008")),
                                             Exposure_end=lubridate::mdy(c("11/30/2008","05/16/2009","10/13/2009","02/17/2008","06/18/2010")),
                                             Exposure_level=c("metformin","insulin","insulin","sulfonylurea","met+sul"))

knitr::kables(list(knitr::kable(input.exposure, caption = "Input exposure data for a binary exposure"),
                   knitr::kable(input.exposure.cat, caption = "Input exposure data for a categorical exposure")
                   ))

Creating the exposure LtAtData object

When creating the exposure LtAtData object, the user must specify the following arguments:

Define exposure object:

## Binary exposure
exposure <- setExposure(data = input.exposure,
                        IDvar = "ID",
                        start_date = "Exposure_start",
                        end_date = "Exposure_end")

## Categorical exposure
exposure.cat <- setExposure(data = input.exposure.cat,
                        IDvar = "ID",
                        start_date = "Exposure_start",
                        end_date = "Exposure_end",
                        exp_level = "Exposure_level",
                        exp_ref = "None")

2.2.1 Instantaneous exposures

The input exposure data set encodes the exposure regimens for all subjects in the cohort by describing the single day during which the subjects are exposed to an exposure level other than the reference exposure level chosen by the user (i.e., the output data set created by the package will be based on the assumption that each patients is exposed to the reference level except if encoded otherwise by the exposure data set). Thus, if a subject only experiences the reference exposure level during follow-up, there should be no record for this subject in this exposure data set. Otherwise, this data set can contain multiple rows for the subject and must include the following three (and sometimes only the first two) variables (all others are ignored):

The exposure episodes described by rows with the same ID must be non-overlapping. Missing values are not allowed in the exposure data set. All subject identifiers in the exposure data set must also be present in the cohort data set. In addition, while the exposure data set may contain measurements collected strictly before a subject's index date or strictly after a subject's end of follow-up date (both dates are specified in the cohort data set), all exposure measurements collected strictly before the index date or strictly after the eof date will be ignored/discarded by the package, i.e. the output data set created by the package will not incorporate these observations. The value that encodes the reference exposure level used in the output data set will be set to '0' if the third column of the input exposure data set described above is missing and, otherwise, it will be set to '0' if that column is specified as a numeric variable and it will be set to 'not exposed' if that column is a character variable.

indexDate_001 <- input.cohort[ID=="0001",lubridate::as_date(Index_date)]
indexDate_003 <- input.cohort[ID=="0003",lubridate::as_date(Index_date)]

expDT <- setInstantExposure(
    rbind(data.table::data.table("ID"="0001","fill.date"=indexDate_001,"D.t"="analog insulin","Q.t"=15),
          data.table::data.table("ID"="0001","fill.date"=indexDate_001+10,"D.t"="analog insulin","Q.t"=90),
          data.table::data.table("ID"="0001","fill.date"=indexDate_001+10+80,"D.t"="analog insulin","Q.t"=90),
          data.table::data.table("ID"="0001","fill.date"=indexDate_001+10+90,"D.t"="human insulin","Q.t"=15),
          data.table::data.table("ID"="0001","fill.date"=indexDate_001+10+90+30,"D.t"="human insulin","Q.t"=180),
          data.table::data.table("ID"="0003","fill.date"=indexDate_003,"D.t"="analog insulin","Q.t"=15),
          data.table::data.table("ID"="0003","fill.date"=indexDate_003+1,"D.t"="analog insulin","Q.t"=90),
          data.table::data.table("ID"="0003","fill.date"=indexDate_003+2,"D.t"="human insulin","Q.t"=90)          
          ),
    "ID", "fill.date", c("D.t","Q.t"))

input_InstExp_bin <- expDT$data[,.(ID,fill.date)]
input_InstExp_cat <- expDT$data

knitr::kables(list(knitr::kable(input_InstExp_bin, caption = "Input exposure data for a binary exposure"),
                   knitr::kable(input_InstExp_cat, caption = "Input exposure data for a categorical exposure")
                   ))

Creating the exposure LtAtData object

When creating the exposure LtAtData object, the user must specify the following arguments:

Define exposure object:

## Binary exposure
exposure_instant_binary <- setInstantExposure(data = input_InstExp_bin,
                                              IDvar = "ID",
                                              exp_date = "fill.date")

## Categorical exposure
exposure_instant_categorical <- setInstantExposure(data = input_InstExp_cat,
                                                   IDvar = "ID",
                                                   exp_date = "fill.date",
                                                   exp_level = c("D.t","Q.t"))

2.3. Covariate definition

The input covariate data set(s) encode follow-up measurements strictly after baseline (i.e., index date) for all time-dependent variables (e.g., laboratory measurements, diagnosis, procedures, and drug prescriptions) other than the exposure, outcome and censoring variables. For each time-dependent covariate, a separate data set is used to store all follow-up measurements. This data set must include the following three (and sometimes only the first two) variables (all others are ignored):

Typically, each covariate data set will contain more than one row with the same 'ID', i.e. , multiple measurements per subject although some subjects may only have one follow-up measurement or none. However, each covariate data set must not contain more than one measurement per day for any given subject. In addition, while each covariate data set may contain measurements collected before a subject's index date or after a subject's eof date (both dates are specified in the cohort data set), all covariate measurements collected on or before the index date or after the eof date will be ignored/discarded by the package, i.e. the output data set created by the package will not incorporate these observations. Missing covariate information during follow-up (i.e., after the index date or before or on the eof date) must be encoded by the absence of a record in the covariate data set. In other words, there should not be any missing values for the required three (sometimes two) columns outlined above in the covariate data sets. Finally, all subject identifiers in each covariate data set must also be present in the cohort data set.

The argument type must be populated with the value "binary monotone increasing", "interval", "sporadic", or "indicator" as described below:

2.3.1 Binary monotone increasing

input.covariate.behav.1 <- data.table::data.table(ID=c("0001","0002","0003"),
                                                    Datevar=lubridate::mdy(c("04/01/2009","12/05/2006","05/01/2008")),
                                                    Hypertension=c(1,1,1))
knitr::kable(input.covariate.behav.1, caption = "Input covariate data set of type binary montone increasing")

Example output data set of type binary monotone increasing:

beh.1.eg1 <- data.table::data.table(ID=rep("EG.ID1",3),intnum=0:2,censor=c(0,0,1),Hypertension=c(1,1,1))
knitr::kable(beh.1.eg1, caption = "On at baselin")

Example output data set of type binary monotone increasing:

beh.1.eg2 <- data.table::data.table(ID=rep("EG.ID2",3),intnum=0:2,censor=c(0,0,1),Hypertension=c(0,0,0))
knitr::kable(beh.1.eg2, caption = "Never on")

Example output data set of type binary monotone increasing:

beh.1.eg3 <- data.table::data.table(ID=rep("EG.ID3",3),intnum=0:2,censor=c(0,0,1),Hypertension=c(0,1,1))
knitr::kable(beh.1.eg3, caption = "On during follow-up")

2.3.2 Interval

input.covariate.behav.2 <- data.table::data.table(ID=c(sort(rep("000"%+%1:5,2))),
                                                  Datevar=lubridate::mdy(c("12/12/2008","12/17/2008","01/01/2006","01/03/2006","07/15/2008","07/30/2008","05/04/2009","05/10/2009","02/01/2008","02/06/2008")),
                                                  Hosp_stay=c(1,0,1,0,1,0,1,0,1,0))
knitr::kable(input.covariate.behav.2, caption = "Input covariate data set of type interval")

Example output data set of type interval:

beh.2.eg1 <- data.table::data.table(ID=rep("EG.ID1",3),intnum=0:2,censor=c(0,0,1),Hosp_stay=c(1,1,1))
knitr::kable(beh.2.eg1, caption = "Always on")

Example output data set of type interval:

beh.2.eg2 <- data.table::data.table(ID=rep("EG.ID2",3),intnum=0:2,censor=c(0,0,1),Hosp_stay=c(0,0,0))
knitr::kable(beh.2.eg2, caption = "Never on")

Example output data set of type interval:

beh.2.eg3 <- data.table::data.table(ID=rep("EG.ID3",3),intnum=0:2,censor=c(0,0,1),Hosp_stay=c(1,0,1))
knitr::kable(beh.2.eg3, caption = "On and off")

2.3.3 Sporadic

input.covariate.behav.4 <- data.table::data.table(ID=c("0001","0001","0002","0002","0003","0003","0004","0004","0005","0005"),
                                                  Datevar=lubridate::mdy(c("01/05/2009","05/08/2009","05/25/2005","07/12/2005","05/18/2007","04/08/2007","01/02/2008","05/09/2010","06/12/2008","03/14/2010")),
                                                  eGFR=c(42.8,43.6,64.7,55.4,60.1,52.3,70.2,64.3,45.7,53.8))
knitr::kable(input.covariate.behav.4, caption = "Input covariate data set of type sporadic:")

Example output data set of type sporadic:

beh.4.eg1 <- data.table::data.table(ID=rep("EG.ID1",3),intnum=0:5,censor=c(0,0,0,0,0,1),eGFR=c(30.8,40.2,44.4,30.4,NA,39.1),I.eGFR=c(0,0,0,0,1,0))
knitr::kable(beh.4.eg1, caption = "Frequent monitoring")

Example output data set of type sporadic:

beh.4.eg2 <- data.table::data.table(ID=rep("EG.ID2",3),intnum=0:5,censor=c(0,0,0,0,0,1),eGFR=c(40.6,43.2,NA,NA,NA,40.4),I.eGFR=c(0,0,1,1,1,0))
knitr::kable(beh.4.eg2, caption = "Less frequent monitoring")

Example output data set of type sporadic:

beh.4.eg3 <- data.table::data.table(ID=rep("EG.ID3",3),intnum=0:5,censor=c(0,0,0,0,0,1),eGFR=c(56.7,NA,NA,NA,NA,NA),I.eGFR=c(0,1,1,1,1,1))
knitr::kable(beh.4.eg3, caption = "No monitoring after baseline")

2.3.4 Indicator

input.covariate.behav.5 <- data.table::data.table(ID=c("0001","0002","0003","0003","0003","0004","0004","0004","0005"),
                                                  Datevar=lubridate::mdy(c("12/01/2008","02/15/2006","01/01/2007","02/02/2008","03/03/2009","11/10/2007","3/25/2009","10/31/2010","8/10/2010")),
                                                  Stroke=c(1,1,1,1,1,1,1,1,1))
knitr::kable(input.covariate.behav.5, caption = "Input covariate data set of type indicator")

Example output data set of type indicator:

beh.5.eg1 <- data.table::data.table(ID=rep("EG.ID1",3),intnum=0:5,censor=c(0,0,0,0,0,1),Stroke=c(0,0,0,0,0,0))
knitr::kable(beh.5.eg1, caption = "No events during follow up")

Example output data set of type indicator:

beh.5.eg2 <- data.table::data.table(ID=rep("EG.ID2",3),intnum=0:5,censor=c(0,0,0,0,0,1),Stroke=c(1,0,0,0,0,0))
knitr::kable(beh.5.eg2, caption = "One event during follow up")

Example output data set of type indicator:

beh.5.eg3 <- data.table::data.table(ID=rep("EG.ID3",3),intnum=0:5,censor=c(0,0,0,0,0,1),Stroke=c(1,0,1,0,1,1))
knitr::kable(beh.5.eg3, caption = "Multiple events during follow up")

Creating the covariate LtAtData objects

When creating the covariate LtAtData object(s), the user must specify the following arguments:

Define covariate objects:

hypertension.cov <- setCovariate(data = input.covariate.behav.1,
                                 type = "binary monotone increasing",
                                 IDvar = "ID",
                                 L_date = "Datevar",
                                 L_name = "Hypertension",
                                 categorical = TRUE,
                                 impute = NA,
                                 impute_default_level = NA,
                                 acute_change = FALSE)

hosp_stay.cov <- setCovariate(data = input.covariate.behav.2,
                              type = "interval",
                              IDvar = "ID",
                              L_date = "Datevar",
                              L_name = "Hosp_stay",
                              categorical = TRUE,
                              impute = NA,
                              impute_default_level = NA,
                              acute_change = FALSE)

egfr.cov <- setCovariate(data = input.covariate.behav.4,
                         type = "sporadic",
                         IDvar = "ID",
                         L_date = "Datevar",
                         L_name = "eGFR",
                         categorical = FALSE,
                         impute = NA,
                         impute_default_level = NA,
                         acute_change = FALSE)

stroke.cov <- setCovariate(data = input.covariate.behav.5,
                           type = "indicator",
                           IDvar = "ID",
                           L_date = "Datevar",
                           L_name = "Stroke",
                           categorical = TRUE,
                           impute = NA,
                           impute_default_level = NA,
                           acute_change = FALSE)

3. Construct definition

The final step of the package construct() maps the input cohort, exposure, and covariate data sets into a structured analytic data set that encodes complex, discrete-time, longitudinal data; first, each input data set must be gathered into a single LtAtData object:

## Final LtAtData object using binary exposure
LtAt.data.binary.At <- cohort + exposure + hypertension.cov + hosp_stay.cov + egfr.cov + stroke.cov

## Final LtAtData object using categorical exposure
LtAt.data.categorical.At <- cohort + exposure.cat + hypertension.cov + hosp_stay.cov + egfr.cov + stroke.cov
## Final LtAtData object using binary exposure
LtAt.data.binary.InstExp <- cohort + exposure_instant_binary + hypertension.cov + hosp_stay.cov + egfr.cov + stroke.cov

## Final LtAtData object using categorical exposure
LtAt.data.categorical.InstExp <- cohort + exposure_instant_categorical + hypertension.cov + hosp_stay.cov + egfr.cov + stroke.cov

A unit of time time_unit has to be specified before running the construct function, and must be populated with the number of days that will serve as the analytic unit of time in the output data set. This unit of time used to create discrete consecutive time intervals between the index date and end of follow-up.

The format argument must be populated with value standard or MSM SAS macro. A value of MSM SAS macro indicates that the output data set to be created by LtAtStrucutR should be formatted for direct use with the %MSM macro developed by the Harvard Causal Inference group. The %MSM macro automates MSM fitting with Inverse Probability Weighting estimation in studies with survival outcomes. The %MSM macro code and its documentation can be downloaded at https://www.hsph.harvard.edu/causal/software/ A value of standard indicates that the output data set created by LtAtStructuR will not be directly compatible for use with the %MSM macro but instead the output data set will be compatible for use with either the ltmle R package developed at the University of California, Berkeley or the stremr R package developed at the Kaiser Permanente Northern California, Division of Research. The ltmle and stremr packages automate the fitting of MSM and dynamic MSM with both Inverse Probability Weighting estimation and Targeted Minimum Loss based Estimation in studies with survival outcomes. ltmle can be downloaded at http://cran.r-project.org/web/packages/ltmle. stremr can be downloaded at http://cran.r-project.org/web/packages/stremr.

The first_exp_rule argument must be populated with value 0 or 1. With this value, the user indicates to the package whether a subject should be deemed first exposed to a non-reference exposure level in the output data set when the subject experiences a non-reference exposure level for at least 1 day or for exp_threshold of the days of a time interval (we recall that each follow-up interval is defined by a number of days specified by time_unit). The value 1 in first_exp_rule is used to indicate that a subject is deemed first exposed to a non-reference exposure level during a time interval in the output data set if the exposure data set indicates exposure to a non-reference exposure level for at least one day of the interval. The value 0 is used to indicate that a subject is deemed first exposed to a non-reference exposure level during a time interval in the output data set if the exposure data set indicates exposure to a non-reference exposure level for at least exp_threshold of the days of the interval. By default (i.e.., if the exp_threshold argument is left unpopulated), the value for exp_threshold used by the package is set to 50% but an alternate value can be specified by populating the exp_threshold argument with any other value lower than or equal to 1 but strictly greater than 0. The max_exp_var argument sets the limit for the maximum number of exposure variables that is expected when the exposure is defined using setInstantExposure. The max_cov_var argument sets the limit for the maximum number of variables that is expected to be created by the routine to encode the levels of each time-dependent covariate when the exposure is defined using setInstantExposure, The summary_cov_var argument indicates the coarsening method applied in each interval to summarize multiple measurements of a time-dependent covariate into a single summary measure when the exposure is defined using setInstantExposure.

LtAt.data.bin.At <- construct(LtAtspec = LtAt.data.binary.At,
                              time_unit = 30,
                              first_exp_rule = 1,
                              exp_threshold = 0.5,
                              format = "standard",
                              dates = FALSE)

LtAt.data.cat.At <- construct(LtAtspec = LtAt.data.categorical.At,
                              time_unit = 30,
                              first_exp_rule = 1,
                              exp_threshold = 0.5,
                              format = "standard",
                              dates = FALSE)
LtAt.data.bin.InstExp <- construct(LtAtspec = LtAt.data.binary.InstExp,
                                   time_unit = 30,
                                   first_exp_rule = 1,
                                   exp_threshold = 0.03,
                                   format = "standard",
                                   dates = FALSE)

LtAt.data.cat.InstExp <- construct(LtAtspec = LtAt.data.categorical.InstExp,
                                   time_unit = 30,
                                   first_exp_rule = 1,
                                   exp_threshold = 0.03,
                                   format = "standard",
                                   dates = FALSE)
LtAt.data.bin.At.harvard <- construct(LtAtspec = LtAt.data.binary.At,
                                      time_unit = 30,
                                      first_exp_rule = 1,
                                      exp_threshold = 0.5,
                                      format = "MSM SAS macro",
                                      dates = FALSE)

LtAt.data.bin.At.harvard.dates <- construct(LtAtspec = LtAt.data.binary.At,
                                            time_unit = 30,
                                            first_exp_rule = 1,
                                            exp_threshold = 0.5,
                                            format = "MSM SAS macro",
                                            dates = TRUE)

4. Output data set

The output data set produced by the LtAtStructuR package organizes the processed longitudinal data for each patient in the cohort into a structured format suitable for analyses by MSM. As described in details in Section 5, each patient's follow-up time is first divided into intervals of constant length (i.e., time_unit). The various measurements in the input data sets are then mapped to these intervals. Each row of the resulting output data set encodes the measurements that characterize a given patient at one such interval. The output data set includes the following columns (the last two are only included when the exposure is categorical with more than two levels):

When construct(...,format = "standard") , the following two tables illustrate the encoding in the output data set of the longitudinal data from two patients who each, respectively, experienced and did not experience the event during follow-up (exposure is binary):

cols <- names(LtAt.data.bin.At)
LtAt.data.bin.At[, (cols) := lapply(.SD, factor), .SDcols = cols]
Yt1 <- LtAt.data.bin.At[ID=="0002",][c(1:3)]
Yt1[3,eval(names(Yt1)):="..."]
Yt1 <- rbind(Yt1,LtAt.data.bin.At[ID=="0002",][c(.N-1,.N)])
Yt1.kable <- knitr::kable(Yt1, "html", caption = "EOF reason is failure")
kableExtra::add_header_above(Yt1.kable, c("$ID$", "$t$", "$\\Gamma$", "$Y(t)$", "$A_2(t)$", "$L_1(t)$", "$I.L_1(t)$", "$L_2(t)$", "$I.L_2(t)$", "$L_3(t)$", "$L_4(t)$", "$L_5(t)$", "$A_1(t)$"))
Yt0 <- LtAt.data.bin.At[ID=="0001",][c(1:3)]
Yt0[3,eval(names(Yt0)):="..."]
Yt0 <- rbind(Yt0,LtAt.data.bin.At[ID=="0001",][c(.N-1,.N)])
Yt0.kable <- knitr::kable(Yt0, "html", caption = "EOF reason is censoring")
kableExtra::add_header_above(Yt0.kable, c("$ID$", "$t$", "$\\Gamma$", "$Y(t)$", "$A_2(t)$", "$L_1(t)$", "$I.L_1(t)$", "$L_2(t)$", "$I.L_2(t)$", "$L_3(t)$", "$L_4(t)$", "$L_5(t)$", "$A_1(t)$"))

Note that each row of these tables contains the measurements of covariates $L_j(t)$ for $j=1,2,…$, exposure $A_1(t)$, outcome $Y(t)$ and censoring variable $A_2(t)$ for a given follow-up interval $t$. The columns $t_{min}$ and $t_{max}$ (i.e., intstart and intend, respectively) contain the dates of each follow-up interval defined by the unit of time specified by the user of the package (i.e., time_unit). In particular, the value for instart in the first row of each table contains the index date for the patient. Because measurements of covariates may not be collected at each time point in non-experimental studies, the tables contain a separate column for each covariate $I.L_1(t$ and $I.L_2(t)$ that indicates whether the corresponding covariate is observed (i.e., the value '0' means that the covariate is observed). Tables such as the ones above are constructed for all patients in the cohort and stacked into a single data set that forms the output data set from the LtAtStructuR package.

The output data set just described cannot be used directly with the %MSM macro developed by the Harvard Causal Inference group to fit Marginal Structural Models by Inverse Probability Weighting estimation. For the output data set from the package LtAtStructuR to be directly usable by the %MSM macro, data from all patients who are censored during the first follow-up interval (i.e., at $t=0$) can be removed from the output data set, and, for all other patients, the values of the outcome ($Y(t)$) and censoring ($A_2(t)$) columns of their tables can be shifted up by one row, the resulting last row can be deleted, and the outcome value in the new last row can be set to missing when $A_2(t)$ is 1 in the new last row. These steps are automated by the LtAtStructuR package when construct(...,format = "MSM SAS macro"). The resulting encoding in the output data set of the longitudinal data from the same two patients described above is illustrated in the following two tables:

cols <- names(LtAt.data.bin.At.harvard)
LtAt.data.bin.At.harvard[, (cols) := lapply(.SD, factor), .SDcols = cols]
Yt1 <- LtAt.data.bin.At.harvard[ID=="0002",][c(1:3)]
Yt1[3,eval(names(Yt1)):="..."]
Yt1 <- rbind(Yt1,LtAt.data.bin.At.harvard[ID=="0002",][c(.N-1,.N)])
Yt1.kable <- knitr::kable(Yt1, "html", caption = "EOF reason is failure")
kableExtra::add_header_above(Yt1.kable, c("$ID$", "$t$", "$\\Gamma$", "$Y(t)$", "$A_2(t)$", "$L_1(t)$", "$I.L_1(t)$", "$L_2(t)$", "$I.L_2(t)$", "$L_3(t)$", "$L_4(t)$", "$L_5(t)$", "$A_1(t)$"))
Yt0 <- LtAt.data.bin.At.harvard[ID=="0001",][c(1:3)]
Yt0[3,eval(names(Yt0)):="..."]
Yt0 <- rbind(Yt0,LtAt.data.bin.At.harvard[ID=="0001",][c(.N-1,.N)])
Yt0.kable <- knitr::kable(Yt0, "html", caption = "EOF reason is censoring")
kableExtra::add_header_above(Yt0.kable, c("$ID$", "$t$", "$\\Gamma$", "$Y(t)$", "$A_2(t)$", "$L_1(t)$", "$I.L_1(t)$", "$L_2(t)$", "$I.L_2(t)$", "$L_3(t)$", "$L_4(t)$", "$L_5(t)$", "$A_1(t)$"))

In addition, when construct(...,dates = TRUE) the measurement dates for the covariates will be displayed:

cols <- names(LtAt.data.bin.At.harvard.dates)
LtAt.data.bin.At.harvard.dates[, (cols) := lapply(.SD, factor), .SDcols = cols]
Yt1 <- LtAt.data.bin.At.harvard.dates[ID=="0002",][c(1:3)]
Yt1[3,eval(names(Yt1)):="..."]
Yt1 <- rbind(Yt1,LtAt.data.bin.At.harvard.dates[ID=="0002",][c(.N-1,.N)])
Yt1.kable <- knitr::kable(Yt1[,.(ID,intnum,intstart,intend,eGFR,dteGFR,Hosp_stay,dtHosp_stay,Hypertension,dtHypertension,Stroke,dtStroke)], "html", caption = "Output with dates")
# Yt1.kable <- knitr::kable(LtAt.data.bin.At.harvard.dates[,.(ID,intnum,intstart,intend,eGFR,dteGFR,Hosp_stay,dtHosp_stay,Hypertension,dtHypertension,Stroke,dtStroke)],"html")
kableExtra::add_header_above(Yt1.kable, c("$ID$", "$t$", "$t_{min}$", "$t_{max}$", "$L_2(t)$", "$date.L_2(t)$", "$L_3(t)$", "$date.L_3(t)$", "$L_4(t)$", "$date.L_4(t)$", "$L_5(t)$","$date.L_5(t)$"))


romainkp/LtAtStructuR documentation built on Aug. 24, 2024, 3:38 p.m.