knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

dynamicLM

The goal of dynamicLM is to provide a simple framework to make dynamic $w$-year risk predictions, allowing for competing risks, time-dependent covariates, and censored data.

What is landmarking and when is it used?

"Dynamic prediction" involves obtaining prediction probabilities at baseline and later points in time; it is essential for better-individualized treatment. Personalized risk is updated with new information and/or as time passes.

illustration of dynamic w-year predictions

An example is cancer treatment: we may want to predict a 5-year risk of recurrence whenever a patient's health information changes. For example, we can predict $w$-year risk of recurrence at baseline (time $=0$) given their initial covariates $Z(0)$ (e.g.,30 years old, on treatment), and we can then predict $w$-year risk at a later point $s$ given their current covariates $Z(s)$ (e.g., $30+s$ years old, off treatment). Note that here the predictions make use of the most recent covariate value of the patient.

The landmark model for survival data is a simple and powerful approach to dynamic prediction for many reasons:

Putter and Houwelingen describe landmarking extensively here and here.

The creation of the landmark model for survival data is built on the concept of risk assessment times (i.e., landmarks) that span risk prediction times of interest. In this approach, a training dataset of the study cohort is transformed into multiple censored datasets based on a prediction window of interest and the predefined landmarks. A model is fit on these stacked datasets (i.e., supermodel), and dynamic risk prediction is then performed by using the most up-to-date value of a patient's covariate values.

Installation

You can install the development version of dynamicLM from GitHub with:

# install.packages("devtools")
devtools::install_github("anyafries/dynamicLM")

Example

This is a basic example which shows you how to use dynamicLM to make dynamic 5-year predictions and check calibration and discrimination metrics.

Data

Data can come in various forms, with or without time-dependent covariates:

We illustrate the package using the long-form example data set given in the package. This gives the time-to-event of cancer relapse under 2 competing risks. 3 fixed patient bio-markers are given as well (age at baseline, stage of initial cancer, bmi, male). A time-dependent covariate treatment indicates if the treatment is on or off treatment and fup_time gives the time at which this patient entry was created.

library(dynamicLM)

data(relapse)
dim(relapse)

length(unique(relapse$ID)) # There are 171 patients with two entries, i.e., one after time 0

Build a super data set

We first note the outcome variables we are interested in, as well as which variables are fixed or landmark-varying. When there are no landmark-varying variables, set varying=NULL.

outcome = list(time="Time", status="event")
covars = list(fixed=c("ID","age.at.time.0","male","stage","bmi"),
              varying=c("treatment"))

We will produce 5-year dynamic predictions of relapse (w). Landmark time points (LMs) are set as every year between 0 and 3 years to train the model. This means we are only interested in prediction between 0 and 3 years.

We will consider linear and quadratic landmark interactions with the covariates (given in func_covars) and the landmarks (func_LMs). The covariates that should have these landmark interactions are given in pred.covars.

w = 5*12                  # risk prediction window (risk within time w)
LMs = seq(0,36,by=6)      # landmarks on which to build the model

# Covariate-landmark time interactions
func.covars <- list( function(t) t, function(t) t^2)
# let hazard depend on landmark time
func.LMs <- list( function(t) t, function(t) t^2)
# Choose covariates that will have time interaction
pred.covars <- c("age","male","stage","bmi","treatment") 

With this, we are ready to build the super data set that will train the model. We print intermediate steps for illustration.

There are three steps:

  1. cutLMsuper: stacks the landmark data sets
  2. An optional additional update for more complex columns that vary with landmark-times: For example, here we update the value of age.
  3. addLMtime: Landmark time interactions are added, note the additional columns created.

Note that these return an object of class LMdataframe. This has a component LMdata which contains the dataset itself. We illustrate the process in detail by printing the entries at each step for one individual, ID1029.

relapse[relapse$ID == "ID1029",]   

We first stack the datasets over the landmarks (see the new column 'LM') and update the treatment covariate. Note that one row is created for each landmark that the individual is still alive at. In this row, if time is greater time than the landmark time plus the window, it is censored at this value (this occurs in the first row, for example, censored at 0+60), and the most recent value all covariates is used (in our case, only treatment varies).

# Stack landmark datasets
LMdata <- cutLMsuper(relapse, outcome, LMs, w, covars, format="long", id="ID", rtime="fup_time", right=F)
data = LMdata$LMdata
print(data[data$ID == "ID1029",] )

We then (optionally) update more complex LM-varying covariates. Here we create an age covariate, based on age at time 0.

LMdata$LMdata$age <- LMdata$LMdata$age.at.time.0 + LMdata$LMdata$LM/12 # age is in years and LM is in months
data = LMdata$LMdata
print(data[data$ID == "ID1029",] )

Lastly, we add landmark time-interactions. The _1 refers to the first interaction in func.covars, _2 refers to the second interaction in func.covars, etc... Similarly, LM_1 and LM_2 are created from func.LM.

LMdata <- addLMtime(LMdata, pred.covars, func.covars, func.LMs) # we use pred.covars here, defined in the previous chunk
data = LMdata$LMdata
print(data[data$ID == "ID1029",] )

Fit the super model

Now we can fit the model. We fit a model with all the covariates created. Note that LMdata$allLMcovars returns a vector with all the covariates that have LM interactions and from pred.covars. Again, the _1 refers to the first interaction in func.covars, _2 refers to the second interaction in func.covars, etc... LM_1 and LM_2 are created from func.LM.

allLMcovars <- LMdata$allLMcovars
print(allLMcovars)

It is then easy to fit a landmark supermodel using fitLM. A formula, super dataset and method need to be provided. If the super dataset is not of class LMdataframe (i.e., is a self-created R dataframe), then additional parameters must be specified. In this case, see the details section of the documentation of addLMtime(...) for information on how the landmark interaction terms must be named.

formula <- "Hist(Time, event, LM) ~ age + male + stage + bmi + treatment + age_1 + age_2 + male_1 + male_2 + stage_1 + stage_2 + bmi_1 + bmi_2 + treatment_1 + treatment_2 + LM_1 + LM_2 + cluster(ID)"
supermodel <- fitLM(as.formula(formula), LMdata, "CSC") 
supermodel

Dynamic hazard ratios can be plotted, either log hazard ratio or hazard ratio using the argument logHR. Only certain plots can also be provided using the covars argument.

par(mfrow=c(2,3))
plot_dynamic_HR(supermodel)
# To create only two plots:
plot_dynamic_HR(supermodel, covars=c("age","male"))

Obtain predictions

For the training data

Predictions for the training data can easily be obtained. This provides $w$-year risk estimates for each individual at each of the training landmarks they are still alive.

p1 = predLMrisk(supermodel)

For new data

A prediction is made for an individual at a specific prediction time. Thus both a prediction ("landmark") time (e.g., at baseline, at 2 years, etc) and an individual (i.e., covariate values set at the landmark time-point) must be given. Note that the model creates the landmark time-interactions; the new data has the same form as in your original dataset. For example, we can prediction $w$-year risk from baseline using an entry from the very original data frame.

# Prediction time
landmark_times = c(0,0)
# Individuals with covariate values at 0
individuals = relapse[1:2,]
individuals$age = individuals$age.at.time.0
print(individuals)
p0 = predLMrisk(supermodel, individuals, landmark_times, cause=1)
p0$preds

Model evaluation

Calibration plots, which assess the agreement between predictions and observations in different percentiles of the predicted values, can be plotted for each of the landmarks used for prediction. Entering a named list of prediction objects from predLMrisk in the first argument allows for comparison between models.

par(mfrow=c(2,2),pty="s")
outlist = LMcalPlot(list("Model1"=p1), 
                    unit="month",            # for the titles
                    tLM=c(6,12,18,24),       # landmarks at which to provide calibration plots
                    method="quantile", q=10, # method for calibration plot
                    ylim=c(0,0.4), xlim=c(0,0.4))

Predictive performance can also be assessed using time-dependent dynamic area under the receiving operator curve (AUCt) or time-dependent dynamic Brier score (BSt).

scores = LMScore(list("Model1"=p1),
                 tLM=c(6,12,18,24), # landmarks at which to provide calibration plots
                 unit="month")      # for the print out
scores

Visualize individual dynamic risk trajectories

Individual risk score trajectories can be plotted. As with predLMrisk, the data input is in the form of the original data. For example, we can consider two individuals of similar age, bmi, and treatment status at baseline, but of different gender.

idx <- relapse$ID %in% c("ID2412","ID1007")
relapse[idx,]

We turn our data into long-form data to plot.

Note: we convert to long-form because of the age variable, wide-form data can be used too if there are no complex variables involved.

# Prediction time points
x = seq(0,36,by=6)

# Stack landmark datasets
dat <- cutLMsuper(relapse[idx,], outcome, x, w, covars, format="long", id="ID", rtime="fup_time", right=F)$LMdata
dat$age <- dat$age.at.time.0 + dat$LM/12 # age is in years and LM is in months

head(dat)
plotLMrisk(supermodel, dat,
            format="long", LM_col = "LM", id_col="ID", ylim=c(0, 0.7), x.legend="bottom", unit="month")

We can see that the male has a much higher and increasing 5-year risk of recurrence that peaks around 1 year, and then rapidly decreases. This can be explained by the dynamic hazard rate of being male. In comparison, the 5-year risk of recurrent for the female remains relatively constant.



anyafries/dynLM documentation built on July 26, 2022, 12:17 a.m.