Getting-Started
In TrialEmulation: Causal Analysis of Observational Time-to-Event Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
old <- options("digits" = 3)
data.table::setDTthreads(2)

This package implements methods to analyse a sequence of target trials.

The steps to do this are:

create a data set with all the required variables
fit models for censoring and calculate inverse probability weights
expand the data set, with records for each eligible patient for each trial period
fit models for outcomes

The package provides two options for conducting the analysis: an all-in-one function and a set of more flexible functions.

Required Data

To get started a longitudinal dataset must be created containing:

time period
patient identifier
treatment indicator
outcome indicator
censoring indicator
eligibility indicator for a trial starting in each time period
other covariates relating to treatment, outcome, or informative censoring to be used in the models for weights or the outcome

An example data set is included to demonstrate the format:

library(TrialEmulation)
# Prepare the example data
data("trial_example")
# Set columns to factors as necessary
trial_example$catvarA <- as.factor(trial_example$catvarA)
trial_example$catvarB <- as.factor(trial_example$catvarB)

head(trial_example)

All-in-one analysis {#initiators}

There is an all-in-one function initators() which does all of the steps with one function call. This is similar to the INITIATORS SAS macro.

Call the initiators() function to run the complete analysis:

result <- initiators(
  data = trial_example,
  id = "id",
  period = "period",
  eligible = "eligible",
  treatment = "treatment",
  estimand_type = "ITT",
  outcome = "outcome",
  model_var = "assigned_treatment",
  outcome_cov = c("catvarA", "catvarB", "nvarA", "nvarB", "nvarC"),
  use_censor_weights = FALSE
)

summary(result)

By default, this returns the final glm model object and the results using the sandwich estimator.

summary(result$model)

Tidy summaries of the robust models are available.

print(result$robust$summary)

Also the sandwich robust variance-covariance matrix.

# only print the first columns
head(result$robust$matrix, c(17, 4))

Flexible Analysis {#flexible}

To gain complete control over the analysis and to inspect the intermediate objects, it can be useful to run the the data preparation and modelling steps separately.

This also allows for processing of very large data sets, by doing the data preparation in chunks and then sampling from the expanded trial data.

# for the purposes of the vignette, we use a temporary directory, however it may be useful to use a permanent
# location in order to inspect the outputs later
working_dir <- file.path(tempdir(TRUE), "trial_emu")
if (!dir.exists(working_dir)) dir.create(working_dir)

prep_data <- data_preparation(
  data = trial_example,
  id = "id",
  period = "period",
  eligible = "eligible",
  treatment = "treatment",
  outcome = "outcome",
  outcome_cov = ~ catvarA + catvarB + nvarA + nvarB + nvarC,
  data_dir = working_dir,
  save_weight_models = TRUE,
  estimand_type = "PP",
  pool_cense = "none",
  use_censor_weights = FALSE,
  chunk_size = 500,
  separate_files = TRUE,
  switch_n_cov = ~ nvarA + nvarB,
  quiet = TRUE
)

Use summary to get an overview of the result.

summary(prep_data)

For more information about the weighting models, we can inspect them individually

prep_data$switch_models$switch_n0

If save_weight_models = TRUE, the full model objects are saved in working_dir, so you can inspect the data used in those models and further investigate how well those models fit.

list.files(working_dir, "*.rds")

# The path is stored in the saved object
switch_n0 <- readRDS(prep_data$switch_models$switch_n0$path)
summary(switch_n0)
hist(switch_n0$fitted.values, main = "Histogram of weights from model switch_n0")

We also see the expanded trial files:

head(prep_data$data)

Each of these csv files contains the data for the trial starting at period _i. To create a manageable dataset for the final analysis we can sample from these trials.

Here we sample 10% of the patients without an event at each follow-up time in each trial. All observations with events are included.

sampled_data <- case_control_sampling_trials(prep_data, p_control = 0.1)
str(sampled_data)

Before proceeding with the modelling, it is possible to manipulate and derive new variables and adjust factor levels in the data.frame. You can also specify transformations in a formula to outcome_cov, include_followup_time or include_trial_period, such as ~ ns(trial_period) or ~ I(nvarA^2) + nvarC > 50 + catvarA:nvarC.

It is also possible to specify formulas in trial_msm() to include, for example, splines with ns().

Now we can fit the model with the trial_msm() function. Since we have sampled from the data, we should make use of the use_sample_weights option to get correct survival estimates.

model_result <- trial_msm(
  data = sampled_data,
  outcome_cov = c("catvarA", "catvarB", "nvarA", "nvarB", "nvarC"),
  model_var = "assigned_treatment",
  glm_function = "glm",
  use_sample_weights = TRUE
)

The result is the same type as the previous result from the simple initiators function, containing the glm object and the sandwich results.

summary(model_result)

summary(model_result$model)

We can also use this object to predict cumulative incidence curves and use these for treatment comparisons. Here we use the patients who were in the first trial as the target population. To get the data in the right format, we can use the data_template returned by data_preparation().

new_data <- data.table::fread(file.path(working_dir, "trial_1.csv"))
new_data <- rbind(data.table::as.data.table(prep_data$data_template), new_data)
model_preds <- predict(model_result, predict_times = c(0:40), newdata = new_data, type = "cum_inc")

The predict function returns a list of 3 data frames: predictions for assigned treatment 0, for assigned treatment 1 and for the difference. It possible to change the type of predicted values, either cumulative incidence or survival.

plot(
  model_preds$difference$followup_time,
  model_preds$difference$cum_inc_diff,
  ty = "l", ylab = "Cumulative Incidence Difference",
  xlab = "Follow-up Time",
  ylim = c(-0.15, 0.05)
)
lines(model_preds$difference$followup_time, model_preds$difference$`2.5%`, lty = 2)
lines(model_preds$difference$followup_time, model_preds$difference$`97.5%`, lty = 2)

# clean up
unlink(working_dir, recursive = TRUE)

options(old)
data.table::setDTthreads(NULL)

Any scripts or data that you put into this service are public.

TrialEmulation documentation built on Sept. 11, 2024, 9:06 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

TrialEmulation
Causal Analysis of Observational Time-to-Event Data

Getting-Started
In TrialEmulation: Causal Analysis of Observational Time-to-Event Data

Required Data

All-in-one analysis {#initiators}

Flexible Analysis {#flexible}

Try the TrialEmulation package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

TrialEmulation Causal Analysis of Observational Time-to-Event Data

Getting-Started In TrialEmulation: Causal Analysis of Observational Time-to-Event Data

Required Data

All-in-one analysis {#initiators}

Flexible Analysis {#flexible}

Try the TrialEmulation package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

TrialEmulation
Causal Analysis of Observational Time-to-Event Data

Getting-Started
In TrialEmulation: Causal Analysis of Observational Time-to-Event Data