The goal of dynamicLM is to provide a simple framework to make dynamic w-year risk predictions, allowing for competing risks, time-dependent covariates, and censored data.
“Dynamic prediction” involves obtaining prediction probabilities at baseline and later points in time; it is essential for better-individualized treatment. Personalized risk is updated with new information and/or as time passes.
An example is cancer treatment: we may want to predict a 5-year risk of recurrence whenever a patient’s health information changes. For example, we can predict w-year risk of recurrence at baseline (time = 0) given their initial covariates Z(0) (e.g.,30 years old, on treatment), and we can then predict w-year risk at a later point s given their current covariates Z(s) (e.g., 30 + s years old, off treatment). Note that here the predictions make use of the most recent covariate value of the patient.
The landmark model for survival data is a simple and powerful approach to dynamic prediction for many reasons:
Putter and Houwelingen describe landmarking extensively here and here.
The creation of the landmark model for survival data is built on the concept of risk assessment times (i.e., landmarks) that span risk prediction times of interest. In this approach, a training dataset of the study cohort is transformed into multiple censored datasets based on a prediction window of interest and the predefined landmarks. A model is fit on these stacked datasets (i.e., supermodel), and dynamic risk prediction is then performed by using the most up-to-date value of a patient’s covariate values.
You can install the development version of dynamicLM
from
GitHub with:
# install.packages("devtools")
devtools::install_github("anyafries/dynamicLM")
This is a basic example which shows you how to use dynamicLM
to make
dynamic 5-year predictions and check calibration and discrimination
metrics.
Data can come in various forms, with or without time-dependent covariates:
We illustrate the package using the long-form example data set given in the package. This gives the time-to-event of cancer relapse under 2 competing risks. 3 fixed patient bio-markers are given as well (age at baseline, stage of initial cancer, bmi, male). A time-dependent covariate treatment indicates if the treatment is on or off treatment and fup_time gives the time at which this patient entry was created.
library(dynamicLM)
#> Loading required package: prodlim
#> Loading required package: survival
data(relapse)
dim(relapse)
#> [1] 989 9
length(unique(relapse$ID)) # There are 171 patients with two entries, i.e., one after time 0
#> [1] 818
We first note the outcome variables we are interested in, as well as
which variables are fixed or landmark-varying. When there are no
landmark-varying variables, set varying=NULL
.
outcome = list(time="Time", status="event")
covars = list(fixed=c("ID","age.at.time.0","male","stage","bmi"),
varying=c("treatment"))
We will produce 5-year dynamic predictions of relapse (w
). Landmark
time points (LMs
) are set as every year between 0 and 3 years to train
the model. This means we are only interested in prediction between 0 and
3 years.
We will consider linear and quadratic landmark interactions with the
covariates (given in func_covars
) and the landmarks (func_LMs
). The
covariates that should have these landmark interactions are given in
pred.covars
.
w = 5*12 # risk prediction window (risk within time w)
LMs = seq(0,36,by=6) # landmarks on which to build the model
# Covariate-landmark time interactions
func.covars <- list( function(t) t, function(t) t^2)
# let hazard depend on landmark time
func.LMs <- list( function(t) t, function(t) t^2)
# Choose covariates that will have time interaction
pred.covars <- c("age","male","stage","bmi","treatment")
With this, we are ready to build the super data set that will train the model. We print intermediate steps for illustration.
There are three steps:
cutLMsuper
: stacks the landmark data setsaddLMtime
: Landmark time interactions are added, note the
additional columns created.Note that these return an object of class LMdataframe
. This has a
component LMdata
which contains the dataset itself. We illustrate the
process in detail by printing the entries at each step for one
individual, ID1029.
relapse[relapse$ID == "ID1029",]
#> ID Time event age.at.time.0 male stage bmi treatment fup_time
#> 7 ID1029 60.03288 0 62.25753 0 0 26.8 0 0.00
#> 8 ID1029 60.03288 0 62.25753 0 0 26.8 1 12.96
We first stack the datasets over the landmarks (see the new column ‘LM’) and update the treatment covariate. Note that one row is created for each landmark that the individual is still alive at. In this row, if time is greater time than the landmark time plus the window, it is censored at this value (this occurs in the first row, for example, censored at 0+60), and the most recent value all covariates is used (in our case, only treatment varies).
# Stack landmark datasets
LMdata <- cutLMsuper(relapse, outcome, LMs, w, covars, format="long", id="ID", rtime="fup_time", right=F)
data = LMdata$LMdata
print(data[data$ID == "ID1029",] )
#> ID Time event ID.1 age.at.time.0 male stage bmi treatment
#> 7 ID1029 60.00000 0 ID1029 62.25753 0 0 26.8 0
#> 73 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 0
#> 733 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 0
#> 736 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 1
#> 751 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 1
#> 786 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 1
#> 788 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 1
#> fup_time LM
#> 7 0.00 0
#> 73 0.00 6
#> 733 0.00 12
#> 736 12.96 18
#> 751 12.96 24
#> 786 12.96 30
#> 788 12.96 36
We then (optionally) update more complex LM-varying covariates. Here we create an age covariate, based on age at time 0.
LMdata$LMdata$age <- LMdata$LMdata$age.at.time.0 + LMdata$LMdata$LM/12 # age is in years and LM is in months
data = LMdata$LMdata
print(data[data$ID == "ID1029",] )
#> ID Time event ID.1 age.at.time.0 male stage bmi treatment
#> 7 ID1029 60.00000 0 ID1029 62.25753 0 0 26.8 0
#> 73 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 0
#> 733 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 0
#> 736 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 1
#> 751 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 1
#> 786 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 1
#> 788 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 1
#> fup_time LM age
#> 7 0.00 0 62.25753
#> 73 0.00 6 62.75753
#> 733 0.00 12 63.25753
#> 736 12.96 18 63.75753
#> 751 12.96 24 64.25753
#> 786 12.96 30 64.75753
#> 788 12.96 36 65.25753
Lastly, we add landmark time-interactions. The _1
refers to the first
interaction in func.covars
, _2
refers to the second interaction in
func.covars
, etc… Similarly, LM_1
and LM_2
are created from
func.LM
.
LMdata <- addLMtime(LMdata, pred.covars, func.covars, func.LMs) # we use pred.covars here, defined in the previous chunk
data = LMdata$LMdata
print(data[data$ID == "ID1029",] )
#> ID Time event ID.1 age.at.time.0 male stage bmi treatment
#> 7 ID1029 60.00000 0 ID1029 62.25753 0 0 26.8 0
#> 73 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 0
#> 733 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 0
#> 736 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 1
#> 751 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 1
#> 786 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 1
#> 788 ID1029 60.03288 0 ID1029 62.25753 0 0 26.8 1
#> fup_time LM age age_1 age_2 male_1 male_2 stage_1 stage_2
#> 7 0.00 0 62.25753 0.0000 0.000 0 0 0 0
#> 73 0.00 6 62.75753 376.5452 2259.271 0 0 0 0
#> 733 0.00 12 63.25753 759.0904 9109.085 0 0 0 0
#> 736 12.96 18 63.75753 1147.6356 20657.441 0 0 0 0
#> 751 12.96 24 64.25753 1542.1808 37012.340 0 0 0 0
#> 786 12.96 30 64.75753 1942.7260 58281.781 0 0 0 0
#> 788 12.96 36 65.25753 2349.2712 84573.764 0 0 0 0
#> bmi_1 bmi_2 treatment_1 treatment_2 LM_1 LM_2
#> 7 0.0 0.0 0 0 0 0
#> 73 160.8 964.8 0 0 6 36
#> 733 321.6 3859.2 0 0 12 144
#> 736 482.4 8683.2 18 324 18 324
#> 751 643.2 15436.8 24 576 24 576
#> 786 804.0 24120.0 30 900 30 900
#> 788 964.8 34732.8 36 1296 36 1296
Now we can fit the model. We fit a model with all the covariates
created. Note that LMdata$allLMcovars
returns a vector with all the
covariates that have LM interactions and from pred.covars
. Again, the
_1
refers to the first interaction in func.covars
, _2
refers to
the second interaction in func.covars
, etc… LM_1
and LM_2
are
created from func.LM
.
allLMcovars <- LMdata$allLMcovars
print(allLMcovars)
#> [1] "age" "male" "stage" "bmi" "treatment"
#> [6] "age_1" "age_2" "male_1" "male_2" "stage_1"
#> [11] "stage_2" "bmi_1" "bmi_2" "treatment_1" "treatment_2"
#> [16] "LM_1" "LM_2"
It is then easy to fit a landmark supermodel using fitLM
. A formula,
super dataset and method need to be provided. If the super dataset is
not of class LMdataframe
(i.e., is a self-created R dataframe), then
additional parameters must be specified. In this case, see the details
section of the documentation of addLMtime(...)
for information on how
the landmark interaction terms must be named.
formula <- "Hist(Time, event, LM) ~ age + male + stage + bmi + treatment + age_1 + age_2 + male_1 + male_2 + stage_1 + stage_2 + bmi_1 + bmi_2 + treatment_1 + treatment_2 + LM_1 + LM_2 + cluster(ID)"
supermodel <- fitLM(as.formula(formula), LMdata, "CSC")
#> Warning in .recacheSubclasses(def@className, def, env): undefined subclass
#> "numericVector" of class "Mnumeric"; definition not updated
#> Warning in agreg.fit(X, Y, istrat, offset, init, control, weights = weights, :
#> Loglik converged before variable 8,9 ; beta may be infinite.
supermodel
#>
#> Landmark cause-specific cox super model fit for dynamic 60-year prediction:
#>
#> $model
#> ----------> Cause: 1
#> coef exp(coef) se(coef) robust se z p
#> age 2.896e-02 1.029e+00 3.093e-02 3.433e-02 0.844 0.39893
#> male 1.632e+00 5.112e+00 5.271e-01 5.254e-01 3.105 0.00190
#> stage 8.954e-01 2.448e+00 2.685e-01 2.881e-01 3.108 0.00189
#> bmi 2.262e-03 1.002e+00 2.403e-02 2.511e-02 0.090 0.92821
#> treatment -1.472e+00 2.295e-01 1.287e+00 1.345e+00 -1.094 0.27384
#> age_1 4.573e-04 1.000e+00 4.384e-03 2.606e-03 0.175 0.86071
#> age_2 -8.016e-05 9.999e-01 1.238e-04 7.297e-05 -1.098 0.27201
#> male_1 1.453e-01 1.156e+00 1.083e-01 2.397e-02 6.064 1.33e-09
#> male_2 -8.921e-03 9.911e-01 4.959e-03 8.159e-04 -10.934 < 2e-16
#> stage_1 1.067e-02 1.011e+00 3.860e-02 2.052e-02 0.520 0.60302
#> stage_2 -1.056e-03 9.989e-01 1.121e-03 5.840e-04 -1.807 0.07069
#> bmi_1 1.340e-03 1.001e+00 3.385e-03 1.535e-03 0.873 0.38258
#> bmi_2 -6.102e-05 9.999e-01 9.713e-05 3.813e-05 -1.600 0.10950
#> treatment_1 1.503e-01 1.162e+00 1.150e-01 9.866e-02 1.524 0.12760
#> treatment_2 -2.950e-03 9.971e-01 2.456e-03 1.885e-03 -1.565 0.11759
#> LM_1 -7.468e-02 9.280e-01 2.784e-01 1.708e-01 -0.437 0.66195
#> LM_2 6.981e-03 1.007e+00 7.837e-03 4.848e-03 1.440 0.14991
#>
#> Likelihood ratio test=63.51 on 17 df, p=2.741e-07
#> n= 2787, number of events= 251
#>
#>
#> ----------> Cause: 2
#> coef exp(coef) se(coef) robust se z p
#> age 3.047e-02 1.031e+00 9.707e-03 1.053e-02 2.894 0.003799
#> male -7.321e-03 9.927e-01 2.926e-01 3.284e-01 -0.022 0.982215
#> stage -1.346e-01 8.741e-01 9.742e-02 9.975e-02 -1.349 0.177310
#> bmi -4.412e-03 9.956e-01 8.547e-03 9.176e-03 -0.481 0.630639
#> treatment 6.513e-01 1.918e+00 4.589e-01 4.592e-01 1.418 0.156087
#> age_1 1.776e-03 1.002e+00 2.051e-03 1.884e-03 0.942 0.345945
#> age_2 -3.555e-05 1.000e+00 6.811e-05 5.581e-05 -0.637 0.524153
#> male_1 7.982e-01 2.222e+00 4.063e+01 2.136e-01 3.736 0.000187
#> male_2 -1.592e-01 8.528e-01 6.771e+00 1.779e-02 -8.949 < 2e-16
#> stage_1 1.550e-02 1.016e+00 2.038e-02 1.755e-02 0.883 0.377385
#> stage_2 -9.016e-04 9.991e-01 6.987e-04 6.136e-04 -1.469 0.141745
#> bmi_1 -1.082e-03 9.989e-01 1.725e-03 1.308e-03 -0.827 0.408164
#> bmi_2 -1.624e-06 1.000e+00 5.694e-05 3.950e-05 -0.041 0.967194
#> treatment_1 -5.332e-02 9.481e-01 4.928e-02 4.293e-02 -1.242 0.214307
#> treatment_2 1.182e-03 1.001e+00 1.174e-03 9.039e-04 1.307 0.191124
#> LM_1 -8.247e-02 9.208e-01 1.342e-01 1.175e-01 -0.702 0.482726
#> LM_2 2.226e-03 1.002e+00 4.466e-03 3.534e-03 0.630 0.528756
#>
#> Likelihood ratio test=75.45 on 17 df, p=2.438e-09
#> n= 2787, number of events= 1120
#>
#>
#> $func_covars
#> $func_covars$[[1]]
#> function(t) t
#> <environment: 0x7fe63ca4c9d8>
#>
#> $func_covars$[[2]]
#> function(t) t^2
#> <environment: 0x7fe63ca4c9d8>
#>
#> $func_LMs
#> $func_LMs$[[1]]
#> function(t) t
#> <environment: 0x7fe63ca4c9d8>
#>
#> $func_LMs$[[2]]
#> function(t) t^2
#> <environment: 0x7fe63ca4c9d8>
#>
#> $w
#> [1] 60
#>
#> $end_time
#> [1] 36
#>
#> $type
#> [1] "CSC"
Dynamic hazard ratios can be plotted, either log hazard ratio or hazard
ratio using the argument logHR
. Only certain plots can also be
provided using the covars
argument.
par(mfrow=c(2,3))
plot_dynamic_HR(supermodel)
# To create only two plots:
plot_dynamic_HR(supermodel, covars=c("age","male"))
Predictions for the training data can easily be obtained. This provides w-year risk estimates for each individual at each of the training landmarks they are still alive.
p1 = predLMrisk(supermodel)
A prediction is made for an individual at a specific prediction time. Thus both a prediction (“landmark”) time (e.g., at baseline, at 2 years, etc) and an individual (i.e., covariate values set at the landmark time-point) must be given. Note that the model creates the landmark time-interactions; the new data has the same form as in your original dataset. For example, we can prediction w-year risk from baseline using an entry from the very original data frame.
# Prediction time
landmark_times = c(0,0)
# Individuals with covariate values at 0
individuals = relapse[1:2,]
individuals$age = individuals$age.at.time.0
print(individuals)
#> ID Time event age.at.time.0 male stage bmi treatment fup_time
#> 1 ID1007 62.6849315 0 60.25936 0 1 25.9 0 0
#> 2 ID101 0.6575342 1 59.97808 0 0 29.3 0 0
#> age
#> 1 60.25936
#> 2 59.97808
p0 = predLMrisk(supermodel, individuals, landmark_times, cause=1)
p0$preds
#> LM risk
#> 1 0 0.11514265
#> 2 0 0.04641678
Calibration plots, which assess the agreement between predictions and
observations in different percentiles of the predicted values, can be
plotted for each of the landmarks used for prediction. Entering a named
list of prediction objects from predLMrisk
in the first argument
allows for comparison between models.
par(mfrow=c(2,2),pty="s")
outlist = LMcalPlot(list("Model1"=p1),
unit="month", # for the titles
tLM=c(6,12,18,24), # landmarks at which to provide calibration plots
method="quantile", q=10, # method for calibration plot
ylim=c(0,0.4), xlim=c(0,0.4))
Predictive performance can also be assessed using time-dependent dynamic area under the receiving operator curve (AUCt) or time-dependent dynamic Brier score (BSt).
scores = LMScore(list("Model1"=p1),
tLM=c(6,12,18,24), # landmarks at which to provide calibration plots
unit="month") # for the print out
scores
#>
#> Metric: Time-dependent AUC for 60-month risk prediction
#>
#> Results by model:
#> tLM model AUC lower upper
#> 1: 6 Model1 61.554 55.203 67.905
#> 2: 12 Model1 62.136 56.204 68.068
#> 3: 18 Model1 62.263 56.820 67.706
#> 4: 24 Model1 63.373 58.017 68.729
#> NOTE: Values are multiplied by 100 and given in %.
#> NOTE: The higher AUC the better.
#> NOTE: Predictions are made at time tLM for 60-year risk
#>
#> Metric: Brier Score for 60-month risk prediction
#>
#> Results by model:
#> tLM model Brier lower upper
#> 1: 6 Null model 8.140 5.922 10.358
#> 2: 6 Model1 7.586 5.458 9.715
#> 3: 12 Null model 10.362 7.576 13.148
#> 4: 12 Model1 9.562 6.848 12.277
#> 5: 18 Null model 10.601 7.552 13.651
#> 6: 18 Model1 10.032 7.063 13.002
#> 7: 24 Null model 10.443 7.099 13.788
#> 8: 24 Model1 10.216 6.909 13.523
#> NOTE: Values are multiplied by 100 and given in %.
#> NOTE: The lower Brier the better.
#> NOTE: Predictions are made at time tLM for 60-year risk
Individual risk score trajectories can be plotted. As with predLMrisk
,
the data input is in the form of the original data. For example, we can
consider two individuals of similar age, bmi, and treatment status at
baseline, but of different gender.
idx <- relapse$ID %in% c("ID2412","ID1007")
relapse[idx,]
#> ID Time event age.at.time.0 male stage bmi treatment fup_time
#> 1 ID1007 62.68493 0 60.25936 0 1 25.9 0 0.00
#> 442 ID2412 43.35342 0 60.09132 1 0 24.1 0 0.00
#> 443 ID2412 43.35342 0 60.09132 1 0 24.1 1 39.04
We turn our data into long-form data to plot.
Note: we convert to long-form because of the age variable, wide-form data can be used too if there are no complex variables involved.
# Prediction time points
x = seq(0,36,by=6)
# Stack landmark datasets
dat <- cutLMsuper(relapse[idx,], outcome, x, w, covars, format="long", id="ID", rtime="fup_time", right=F)$LMdata
dat$age <- dat$age.at.time.0 + dat$LM/12 # age is in years and LM is in months
head(dat)
#> ID Time event ID.1 age.at.time.0 male stage bmi treatment
#> 1 ID1007 60.00000 0 ID1007 60.25936 0 1 25.9 0
#> 442 ID2412 43.35342 0 ID2412 60.09132 1 0 24.1 0
#> 11 ID1007 62.68493 0 ID1007 60.25936 0 1 25.9 0
#> 4421 ID2412 43.35342 0 ID2412 60.09132 1 0 24.1 0
#> 12 ID1007 62.68493 0 ID1007 60.25936 0 1 25.9 0
#> 4422 ID2412 43.35342 0 ID2412 60.09132 1 0 24.1 0
#> fup_time LM age
#> 1 0 0 60.25936
#> 442 0 0 60.09132
#> 11 0 6 60.75936
#> 4421 0 6 60.59132
#> 12 0 12 61.25936
#> 4422 0 12 61.09132
plotLMrisk(supermodel, dat,
format="long", LM_col = "LM", id_col="ID", ylim=c(0, 0.7), x.legend="bottom", unit="month")
We can see that the male has a much higher and increasing 5-year risk of recurrence that peaks around 1 year, and then rapidly decreases. This can be explained by the dynamic hazard rate of being male. In comparison, the 5-year risk of recurrent for the female remains relatively constant.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.