knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of dynamicLM is to provide a simple framework to make dynamic $w$-year risk predictions, allowing for competing risks, time-dependent covariates, and censored data.
"Dynamic prediction" involves obtaining prediction probabilities at baseline and later points in time; it is essential for better-individualized treatment. Personalized risk is updated with new information and/or as time passes.
An example is cancer treatment: we may want to predict a 5-year risk of recurrence whenever a patient's health information changes. For example, we can predict $w$-year risk of recurrence at baseline (time $=0$) given their initial covariates $Z(0)$ (e.g.,30 years old, on treatment), and we can then predict $w$-year risk at a later point $s$ given their current covariates $Z(s)$ (e.g., $30+s$ years old, off treatment). Note that here the predictions make use of the most recent covariate value of the patient.
The landmark model for survival data is a simple and powerful approach to dynamic prediction for many reasons:
Putter and Houwelingen describe landmarking extensively here and here.
The creation of the landmark model for survival data is built on the concept of risk assessment times (i.e., landmarks) that span risk prediction times of interest. In this approach, a training dataset of the study cohort is transformed into multiple censored datasets based on a prediction window of interest and the predefined landmarks. A model is fit on these stacked datasets (i.e., supermodel), and dynamic risk prediction is then performed by using the most up-to-date value of a patient's covariate values.
You can install the development version of dynamicLM
from GitHub with:
# install.packages("devtools") devtools::install_github("anyafries/dynamicLM")
This is a basic example which shows you how to use dynamicLM
to make dynamic 5-year predictions and check calibration and discrimination metrics.
Data can come in various forms, with or without time-dependent covariates:
We illustrate the package using the long-form example data set given in the package. This gives the time-to-event of cancer relapse under 2 competing risks. 3 fixed patient bio-markers are given as well (age at baseline, stage of initial cancer, bmi, male). A time-dependent covariate treatment indicates if the treatment is on or off treatment and fup_time gives the time at which this patient entry was created.
library(dynamicLM) data(relapse)
dim(relapse) length(unique(relapse$ID)) # There are 171 patients with two entries, i.e., one after time 0
We first note the outcome variables we are interested in, as well as which variables are fixed or landmark-varying. When there are no landmark-varying variables, set varying=NULL
.
outcome = list(time="Time", status="event") covars = list(fixed=c("ID","age.at.time.0","male","stage","bmi"), varying=c("treatment"))
We will produce 5-year dynamic predictions of relapse (w
). Landmark time points (LMs
) are set as every year between 0 and 3 years to train the model. This means we are only interested in prediction between 0 and 3 years.
We will consider linear and quadratic landmark interactions with the covariates (given in func_covars
) and the landmarks (func_LMs
). The covariates that should have these landmark interactions are given in pred.covars
.
w = 5*12 # risk prediction window (risk within time w) LMs = seq(0,36,by=6) # landmarks on which to build the model # Covariate-landmark time interactions func.covars <- list( function(t) t, function(t) t^2) # let hazard depend on landmark time func.LMs <- list( function(t) t, function(t) t^2) # Choose covariates that will have time interaction pred.covars <- c("age","male","stage","bmi","treatment")
With this, we are ready to build the super data set that will train the model. We print intermediate steps for illustration.
There are three steps:
cutLMsuper
: stacks the landmark data setsaddLMtime
: Landmark time interactions are added, note the additional columns created.Note that these return an object of class LMdataframe
. This has a component LMdata
which contains the dataset itself.
We illustrate the process in detail by printing the entries at each step for one individual, ID1029.
relapse[relapse$ID == "ID1029",]
We first stack the datasets over the landmarks (see the new column 'LM') and update the treatment covariate. Note that one row is created for each landmark that the individual is still alive at. In this row, if time is greater time than the landmark time plus the window, it is censored at this value (this occurs in the first row, for example, censored at 0+60), and the most recent value all covariates is used (in our case, only treatment varies).
# Stack landmark datasets LMdata <- cutLMsuper(relapse, outcome, LMs, w, covars, format="long", id="ID", rtime="fup_time", right=F) data = LMdata$LMdata print(data[data$ID == "ID1029",] )
We then (optionally) update more complex LM-varying covariates. Here we create an age covariate, based on age at time 0.
LMdata$LMdata$age <- LMdata$LMdata$age.at.time.0 + LMdata$LMdata$LM/12 # age is in years and LM is in months data = LMdata$LMdata print(data[data$ID == "ID1029",] )
Lastly, we add landmark time-interactions. The _1
refers to the first interaction in func.covars
, _2
refers to the second interaction in func.covars
, etc... Similarly, LM_1
and LM_2
are created from func.LM
.
LMdata <- addLMtime(LMdata, pred.covars, func.covars, func.LMs) # we use pred.covars here, defined in the previous chunk data = LMdata$LMdata print(data[data$ID == "ID1029",] )
Now we can fit the model. We fit a model with all the covariates created. Note that LMdata$allLMcovars
returns a vector with all the covariates that have LM interactions and from pred.covars
. Again, the _1
refers to the first interaction in func.covars
, _2
refers to the second interaction in func.covars
, etc... LM_1
and LM_2
are created from func.LM
.
allLMcovars <- LMdata$allLMcovars print(allLMcovars)
It is then easy to fit a landmark supermodel using fitLM
. A formula, super dataset and method need to be provided. If the super dataset is not of class LMdataframe
(i.e., is a self-created R dataframe), then additional parameters must be specified. In this case, see the details section of the documentation of addLMtime(...)
for information on how the landmark interaction terms must be named.
formula <- "Hist(Time, event, LM) ~ age + male + stage + bmi + treatment + age_1 + age_2 + male_1 + male_2 + stage_1 + stage_2 + bmi_1 + bmi_2 + treatment_1 + treatment_2 + LM_1 + LM_2 + cluster(ID)" supermodel <- fitLM(as.formula(formula), LMdata, "CSC") supermodel
Dynamic hazard ratios can be plotted, either log hazard ratio or hazard ratio using the argument logHR
. Only certain plots can also be provided using the covars
argument.
par(mfrow=c(2,3)) plot_dynamic_HR(supermodel)
# To create only two plots: plot_dynamic_HR(supermodel, covars=c("age","male"))
Predictions for the training data can easily be obtained. This provides $w$-year risk estimates for each individual at each of the training landmarks they are still alive.
p1 = predLMrisk(supermodel)
A prediction is made for an individual at a specific prediction time. Thus both a prediction ("landmark") time (e.g., at baseline, at 2 years, etc) and an individual (i.e., covariate values set at the landmark time-point) must be given. Note that the model creates the landmark time-interactions; the new data has the same form as in your original dataset. For example, we can prediction $w$-year risk from baseline using an entry from the very original data frame.
# Prediction time landmark_times = c(0,0) # Individuals with covariate values at 0 individuals = relapse[1:2,] individuals$age = individuals$age.at.time.0 print(individuals)
p0 = predLMrisk(supermodel, individuals, landmark_times, cause=1) p0$preds
Calibration plots, which assess the agreement between predictions and observations in different percentiles of the predicted values, can be plotted for each of the landmarks used for prediction. Entering a named list of prediction objects from predLMrisk
in the first argument allows for comparison between models.
par(mfrow=c(2,2),pty="s") outlist = LMcalPlot(list("Model1"=p1), unit="month", # for the titles tLM=c(6,12,18,24), # landmarks at which to provide calibration plots method="quantile", q=10, # method for calibration plot ylim=c(0,0.4), xlim=c(0,0.4))
Predictive performance can also be assessed using time-dependent dynamic area under the receiving operator curve (AUCt) or time-dependent dynamic Brier score (BSt).
scores = LMScore(list("Model1"=p1), tLM=c(6,12,18,24), # landmarks at which to provide calibration plots unit="month") # for the print out scores
Individual risk score trajectories can be plotted. As with predLMrisk
, the data input is in the form of the original data. For example, we can consider two individuals of similar age, bmi, and treatment status at baseline, but of different gender.
idx <- relapse$ID %in% c("ID2412","ID1007") relapse[idx,]
We turn our data into long-form data to plot.
Note: we convert to long-form because of the age variable, wide-form data can be used too if there are no complex variables involved.
# Prediction time points x = seq(0,36,by=6) # Stack landmark datasets dat <- cutLMsuper(relapse[idx,], outcome, x, w, covars, format="long", id="ID", rtime="fup_time", right=F)$LMdata dat$age <- dat$age.at.time.0 + dat$LM/12 # age is in years and LM is in months head(dat)
plotLMrisk(supermodel, dat, format="long", LM_col = "LM", id_col="ID", ylim=c(0, 0.7), x.legend="bottom", unit="month")
We can see that the male has a much higher and increasing 5-year risk of recurrence that peaks around 1 year, and then rapidly decreases. This can be explained by the dynamic hazard rate of being male. In comparison, the 5-year risk of recurrent for the female remains relatively constant.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.