In DenisRustand/INLAjoint: Multivariate Joint Modeling for Longitudinal and Time-to-Event Outcomes with 'INLA'

knitr::opts_chunk$set(warning = FALSE, message = FALSE)

library(INLA)
library(INLAjoint)
library(ggplot2)
library(JM) # This package contains the dataset
options(width=120)

\newpage

In this vignette we show how to fit various models with the joint() function of the INLAjoint package. The package is providing a unified framework to fit various longitudinal, survival and joint models. This vignette illustrates some of these models and thus takes a long time to compute which is why the vignette is not compiled when installing INLAjoint but instead we provide a pre-compiled pdf.

Contact: INLAjoint@gmail.com

Datasets for illustrations

Mayo Clinic Primary Biliary Cirrhosis Data (pbc2)

We use the data of the famous randomized clinical trial of Primary Biliary Cholangitis (PBC) patients where 312 PBC patients were followed at the Mayo Clinic between 1974 and 1988 and received either a placebo or D-penicillamine. These data are publicly available in several software including the R package "JM". During the follow-up, 140 patients died and 29 patients received a liver transplantation which we consider here as a competing event of death. In addition, repeated measures of various longitudinal markers potentially associated with the disease progression were collected. It is used to illustrate multiple models of the vignette (1, 2, 3, 4, 5, 6, 7, 13, 14 and 16).

data(pbc2) # dataset
# extract some variable of interest without missing values
Longi <- na.omit(pbc2[, c("id", "years", "status","drug","age", 
                          "sex","year","serBilir","SGOT", "albumin", "edema",
                          "platelets", "alkaline","spiders", "ascites")])
Surv <- Longi[c(which(diff(as.numeric(Longi[,which(colnames(Longi)=="id")]))==1),
                length(Longi[,which(colnames(Longi)=="id")])),-c(7:10, 12:16)]
Surv$death <- ifelse(Surv$status=="dead",1,0) # competing event 1
Surv$trans <- ifelse(Surv$status=="transplanted",1,0) # competing event 2

Bone marrow transplant data (bmt)

Bone marrow transplant study which is available in the package "smcure", it is used to illustrate the mixture cure model (model 10 of the vignette).

Kidney catheter data (kidney)

Recurrence times to infection for kidney patients using portable dialysis equipment, available in package "survival", it is used to illustrate the frailty model (model 11 of the vignette).

Rehospitalization colorectal cancer (readmission)

Rehospitalization times after surgery in patients diagnosed with colorectal cancer, it is available in the package "frailtypack" and it is used to illustrate the joint frailty model for recurrent events and a terminal event (model 12 of the vignette).

FFCD 2000-05 phase III metastatic colorectal cancer clinical trial (colorectal and colorectalLongi)

There are two complementary datasets available in the package "frailtypack". colorectal includes times of new lesions appearance and time of death of 150 randomly chosen patients from the FFCD 2000-05 multicenter clinical trial. colorectalLongi contains the corresponding longitudinal measurements of tumors size. It is used to illustrate the two-part joint model for a longitudinal semicontinuous biomarker, recurrent events and a terminal event (model 15 of the vignette).

Model 1: Mixed effects regression for a single longitudinal marker

This first model shows how to call the joint() function for a simple linear mixed effects model for a longitudinal marker, it gives the basic structure of the function. The required arguments are:

formLong: formula for the model with the lme4 structure (including random effects in the formula as: (NAME | ID)).
dataLong: Dataset that must contain the variables given in the formula.
id: Name of the variable for grouping (e.g., individuals).
timeVar: Name of the time variable.
family: Distribution of the outcome (e.g., gaussian, poisson, binomial).

The model structure is given by the following equation:

$$\log(serBilir_{ij}) = \beta_0 + b_{i0} + \beta_1year_{ij} + \beta_2drug_{i} + \varepsilon_{ij} \hspace{1cm} \text{(L1)}$$

where $\beta$ are the fixed effects, $b_{i0}$ is an individual random intercept and $\varepsilon_{ij}$ is the residual error term.

M1 <- joint(formLong = serBilir ~ year + drug + (1|id),
            dataLong = Longi, id = "id", timeVar = "year", 
            family = "lognormal")

The summary statistics are available from the summary function:

summary(M1)

If one wishes to get the standard deviations instead of variance parameters, it is possible to switch with the sdcor argument of the summary function:

summary(M1, sdcor=TRUE)

The log marginal-likelihood, the Deviance Information Criterion (DIC) and the Widely Applicable Bayesian Information Criterion (WAIC) are provided in the summary statistics.

The control argument in the joint function has the following components:

int.strategy allows to choose the strategy for the numerical integration used to approximate the marginal posterior distributions of the latent field. Available options are "ccd" (default), "grid" or "eb" (empirical Bayes). The empirical Bayes uses only the mode of the approximations for the integration, which speed up and simplifies computations.
priorFixed allows to set the mean and standard deviation of the Gaussian prior for the fixed effects.
priorAssoc allows to set the mean and standard deviation of the Gaussian prior for the association parameters between the longitudinal and survival submodels.
cpo set to TRUE to compute the Conditional Predictive Ordinate.

An useful function to learn about the priors used in a fitted model is inla.priors.used, applied to an object fitted with the joint function. For example the default priors for fixed effects are Gaussian with mean zero and variance 100 (i.e., precision 0.01).

inla.priors.used(M1)

The full list of the arguments is available in the help documentation of the joint function which can be accessed by running ?joint.

Model 2: Multiple longitudinal markers with different distributions

The following code fits a joint model with 3 longitudinal markers including fixed effects for covariates such as sex, drug and interactions with time. We assume random intercept and random slope for each longitudinal trajectory. Note that the formLong argument is now a list of formulas, one for each longitudinal marker and the length of family must match the number of markers.

The model structure is given by the following equation:

$$\left{ \begin{array}{ l @{{}{}} l @{{}{}} r }\log(serBilir_{ij}) &= \beta_{10} + b_{i10} + (\beta_{11} + b_{i11})year_{ij} + \beta_{12}drug_{i} + \beta_{13}sex_i + \beta_{14}year_{ij}drug_{i} + \varepsilon_{ij1} & \text{(L1)}\ \log(E[platelets_{ij}]) &= \beta_{20} + b_{i20} + (\beta_{21} + b_{i21})year_{ij} + \beta_{22}sex_i + \beta_{23}drug_{i} + \beta_{24}year_{ij}sex_{i} & \text{(L2)}\ \text{logit}(E[spiders_{ij}]) &= \beta_{30} + b_{i30} + (\beta_{31} + b_{i31})year_{ij} + \beta_{32}drug_{i} & \text{(L3)} \end{array} \right.$$

M2 <- joint(formLong = list(serBilir ~ year * drug + sex  +  (1+year|id),
                            platelets ~ year * sex  +  drug + (1+year|id),
                            spiders ~ year + drug + (1+year| id)),
            dataLong = Longi, id = "id", timeVar="year", corLong=TRUE,
            family = c("lognormal", "poisson", "binomial"), control=list(int.strategy="eb"))
summary(M2)

The additional boolean argument corLong is set to TRUE in order to have correlation between the random effects across the longitudinal markers. Therefore by switching this argument to TRUE, instead of having 3 sets of two correlated random effects, we have 1 set of 6 correlated random effects.

We can also get the standard deviation and correlation of random parameters instead of variance and covariance by adding sdcor=TRUE to the summary function call:

summary(M2, sdcor=TRUE)

The link functions between the linear predictors and the longitudinal outcomes are set to default, it is however possible to switch to alternative ones using the link argument, e.g., to switch from logit to probit for the binary marker:

M2 <- joint(formLong = list(serBilir ~ year * drug + sex  +  (1+year|id),
                            platelets ~ year * sex  +  drug + (1+year|id),
                            spiders ~ year + drug + (1+year| id)),
            dataLong = Longi, id = "id", timeVar="year", corLong=TRUE,
            family = c("lognormal", "poisson", "binomial"), 
            link = c("default", "default", "probit"), control=list(int.strategy="eb"))
summary(M2)

Model 3: Proportional hazards survival model

Some additional arguments are introduced to fit a survival model:

formSurv: formula for the time-to-event outcome, with the response given as an inla.surv() object.
dataSurv: dataset that must contain the variables given in the formula. When fitting a joint model with a longitudinal component, if dataSurv is not provided, the longitudinal dataset is used to get the covariates values included in the time-to-event formula.
basRisk: the baseline risk of event. It can be defined as parametric with either "exponentialsurv" for exponential baseline or "weibullsurv" for Weibull baseline (note that there are two formulations of the Weibull distribution, see inla.doc("weibull") for more details, default is variant = 0). Alternatively, there are two options to avoid parametric assumptions on the shape of the baseline risk: "rw1" for random walks of order one prior that corresponds to a smooth spline function based on first order differences. The second option "rw2" assigns a random walk order two prior that corresponds to a smooth spline function based on second order differences. This second option provides a smoother spline compared to order one since the smoothing is then done on the second order. The number of bins that define the intervals for the baseline risk can be specified with NbasRisk (default 15 bins).

The model is defined as: $$\lambda_{i}(t)=\lambda_{0}(t)\ \textrm{exp}\left(\gamma_1drug_i + \gamma_2sex_i +\gamma_3drug_isex_i\right)$$

M3 <- joint(formSurv = inla.surv(time = years, event = death) ~ drug * sex, dataSurv = Surv)
summary(M3)

Note that the parameter associated to the baseline risk of event "Baseline risk (variance)" corresponds to the variance parameter associated to the random walk to capture the evolution over time. Summary statistics of the random walks baseline risks are available in the "BaselineValues" attribute of an INLAjoint summary object.

summary(M3)$BaselineValues

It is possible to convert summary statistics to have hazards ratios instead of the mean parameter values in the linear predictor (does not affect baseline parameters).

summary(M3, hazr=TRUE)

Results are similar to the coxph function fit from survival:

survival::coxph(Surv(years, death) ~ drug * sex, data = Surv)

While the default baseline risk is set to first order random walk with 15 intervals, we can easily switch to parametric Weibull:

M3_wei <- joint(formSurv = inla.surv(time = years, event = death) ~ drug * sex, 
                dataSurv = Surv, basRisk = "weibullsurv")
summary(M3_wei, hazr=T)

Model 4: Longitudinal - survival joint model

An additional argument is required to set up the association between the longitudinal and survival parts:

assoc: a character string that specifies the association between the longitudinal and survival components. The available options are "CV" for sharing the current value of the linear predictor, "CS" for the current slope, "CV_CS" for the current value and the current slope, "SRE" for shared random effects (i.e., sharing the individual deviation from the mean at time t as defined by the random effects), "SRE_ind" for shared random effect independent (each random effect's individual deviation is associated to an association parameter in the survival submodel) and "" (empty string) for no association.

The model structure is given by the following equation:

$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r }log(serBilir_{ij}) &= \eta_{i}(t_{ij}) + \varepsilon_{ij} & \text{(L1)}\ &= \beta_0 + \beta_1year_{ij} + (\beta_2 + b_{i2})year_{ij}^2 \ & \ \ \ + (\beta_3 + b_{i3})year_{ij}^3 + \beta_4drug_{i} + \beta_5year_{ij}drug_{i}\ & \ \ \ + \beta_6year_{ij}^2drug_{i} + \beta_7year_{ij}^3drug_{i} + \varepsilon_{ij}\ \lambda_{i1}(t)&=\lambda_{01}(t)\ \textrm{exp}\left(\gamma_1drug_i + \varphi_1\eta_{i}(t) + \varphi_2\eta_{i}'(t)\right) & \text{(S1)} \end{array} \right.$$

where $\gamma$ denotes fixed effects of the survival part and $\varphi$ the association parameters.

# Set up quadratic and cubic functions of time
f1 <- function(x) x^2
f2 <- function(x) x^3

M4 <- joint(formSurv = inla.surv(years, death) ~ drug,
            formLong = serBilir ~ (1 + year + f1(year) + f2(year))*drug +
                       (f1(year) + f2(year) |id), family = "lognormal",
            dataLong = Longi, dataSurv=Surv, id = "id", timeVar = "year", 
            assoc = "CV_CS", basRisk = "rw2", NbasRisk=25, 
            control=list(int.strategy="eb"))
summary(M4)

In case some functions of time should be included, they must be set as illustrated in the above example, i.e., create a univariate function of x named f1, f2, ..., fN, and use this function in the formula (where N=20 is the maximum number of functions available at the moment but could be increased if needed.). This is important to be able to compute the value of the linear predictor at any time t, particularly for the time-dependent association structures. A numerical approximation of the derivative of the function is automatically computed in case the current slope of the linear predictor is shared in the survival submodel.

We can plot the posterior distribution for all the parameters with the plot function

plotM4 <- plot(M4, sdcor=T)

The plot function returns multiple plots for each component of the model. First the plots for the longitudinal outcome(s) parameters:

plotM4$Outcomes$L1

Then the parameters of the survival outcome(s):

plotM4$Outcomes$S1

The variance-covariance of the random-effects (converted to standard deviations and correlations when argument sdcor=TRUE is added to the call of the plot function):

plotM4$Covariances

The posterior distributions of the association parameters:

plotM4$Associations

And finally the curve for the baseline risk functions:

plotM4$Baseline

The model for the baseline risk is a random walk with number of bins given by argument NbasRisk, the curve plotted is constant between the bins but converges towards a smooth spline when the number of bins increase. Sometimes the scale for the baseline hazard risk may require to have a log10 y-axis, this can easily be done using to the ggplot2 framework. Moreover, the data associated to each plot is available in the object that contains the result of the plot function call (i.e., plotM4 in the example).

plotM4$Baseline + scale_y_log10()

Model 5: Comparison with MCMC

We can make a comparison of INLAjoint with Bayesian estimations with MCMC implemented in alternative R packages such as JMBayes2 (C++ implementation of MCMC) or rstanarm (Stan).

We propose a comparison for a simple joint model with one longitudinal and one survival component:

$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r }albumin_{ij} &= \eta_{i}(t_{ij}) + \varepsilon_{ij} & \text{(L1)}\ &= \beta_0 + b_{i0} + (\beta_1 + b_{i1})year_{ij} + \beta_2drug_{i} + \beta_3year_{ij}drug_{i} + \varepsilon_{ij}\ \lambda_{i}(t)&=\lambda_{0}(t)\ \textrm{exp}\left(\gamma_{1}sex_i + \gamma_{2}drug_i + \varphi\eta_{i}(t)\right) & \text{(S1)} \end{array} \right.$$

# INLAjoint
M5 <- joint(formSurv = inla.surv(years, death) ~ sex + drug,
            formLong = albumin ~ (1 + year)*drug + (1 + year |id),
            dataLong = Longi, dataSurv=Surv, id = "id", timeVar = "year",
            assoc = "CV", control=list(priorFixed=list(mean=0, prec=0.16,
                                  mean.intercept=0, prec.intercept=0.16),
                                  priorAssoc=list(mean=0, prec=0.16)))
summary(M5)

Here the prior distributions of the fixed effects and association parameters are changed to have precision 0.16 (i.e., variance 6.25 instead of the default value of 100), in order to match the default prior distributions of rstanarm for the fixed effects and association parameters.

# JMBayes2
library(JMbayes2)
M5JMB2_lme <- lme(albumin ~ (1 + year)*drug,
                 random = ~ 1 + year |id, data = Longi)
M5JMB2_cox <- coxph(Surv(Surv$years, Surv$death) ~ sex + drug,
                   data = Surv, x = TRUE)
JMpr = list(mean_alphas=list(0), Tau_alphas = list(matrix(0.16)))
M5JMB2 <- try(jm(M5JMB2_cox, list(M5JMB2_lme), time_var = "year", priors=JMpr))
# Computation time in the table includes LME + Cox + JM
summary(M5JMB2)

# rstanarm
library(rstanarm)
library(survival)
options(mc.cores = parallel::detectCores())
M5rstanarm <- stan_jm(
  formulaLong = list(albumin ~ (1 + year)*drug + (1 + year |id)),
  formulaEvent = Surv(years, death) ~ sex + drug,
  dataLong = Longi, dataEvent = Surv,
  time_var = "year",
  priorLong_intercept = normal(0, 2.5, autoscale=TRUE),
  priorLong = normal(0, 2.5),
  priorEvent_assoc = normal(0, 2.5),
  seed = 12345)
summary(M5rstanarm)

tabl <- "
|  Package   | INLAjoint | JMbayes2  | rstanarm  | 
|------------|-----------|-----------|-----------|
| algorithm  |    INLA   | MCMC C++  | MCMC Stan |
| comp. time |   7 sec.  | 1736 sec. |  596 sec. |
"
cat(tabl) # output the table in a format good for HTML/PDF/docx conversion

A more detailed comparison between INLA, MCEM and MCMC is available at https://arxiv.org/abs/2203.06256 and a comparison between INLA and Levenberg-Marquardt algorithm (Newton-Raphson like that performs MLE) is available at https://arxiv.org/abs/2010.13704

Model 6: Joint with one longitudinal and competing risks of event

In order to account for competing risks of event, the formSurv argument is given as a list with one element for each risk submodel. Moreover, the basRisk argument must be a vector with the same number of elements as the number of survival submodels.

The model structure is given by the following equation:

$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r }\log(serBilir_{ij}) &= \eta_{i}(t_{ij}) + \varepsilon_{ij} & \text{(L1)}\ &= \beta_0 + b_{i0} + (\beta_1 + b_{i1})year_{ij} + \beta_2drug_{i} + \beta_3sex_i \ & \ \ \ + \beta_4year_{ij}drug_{i} + \beta_5year_{ij}sex_{i} + \varepsilon_{ij}\ \lambda_{i1}^{death}(t)&=\lambda_{01}(t)\ \textrm{exp}\left(\gamma_{11}sex_i + \gamma_{12}drug_i + \varphi_{11}(b_{i0} + b_{i1}t)\right) & \text{(S1)}\ \lambda_{i2}^{transpl.}(t)&=\lambda_{02}(t)\ \textrm{exp}\left(\gamma_{21}edema_no_i + \gamma_{22}edema_de_i + \gamma_{23}sex_i \right.\ &\left. \ \ + \gamma_{24}edema_no_isex_i + \gamma_{25}edema_de_isex_i + \varphi_{21}b_{i0} + \varphi_{22}b_{i1}\right) & \text{(S2)} \end{array} \right.$$

M6 <- joint(formLong = serBilir ~ year * (drug + sex) + (1+year|id), dataLong = Longi, 
            formSurv = list(inla.surv(years, death) ~ sex + drug,
                            inla.surv(years, trans) ~ edema * sex),
            id = "id", timeVar = "year", family = "lognormal", basRisk = c("rw1", "rw1"), 
            assoc = c("SRE", "SRE_ind"), dataSurv=Surv, control=list(int.strategy="eb"))
summary(M6)

Model 7: Joint with three longitudinal markers and competing risks of event

When multiple longitudinal submodels and survival submodels are included, the arguments formSurv and formLong are both given as lists. The assoc parameter should then be a list with one element for each longitudinal submodel and each element is a vector for the association with each survival submodel.

The model structure is given by the following equation:

$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r }\log(serBilir_{ij}) &= \eta_{i1}(t_{ij}) + \varepsilon_{ij1} = \beta_{10} + b_{i10} + (\beta_{11} + b_{i11})year_{ij} + \beta_{12}drug_{i} + \beta_{13}sex_i & \text{(L1)} \ & \ \ \ + \beta_{14}year_{ij}drug_{i} + \varepsilon_{ij1}\ \log(E[platelets_{ij}]) &= \eta_{i2}(t_{ij}) = \beta_{20} + b_{i20} + (\beta_{21} + b_{i21})year_{ij} + \beta_{22}sex_i + \beta_{23}drug_{i} + \beta_{24}year_{ij}sex_{i} & \text{(L2)}\ \text{logit}(E[spiders_{ij}]) &= \eta_{i3}(t_{ij}) = \beta_{30} + b_{i30} + (\beta_{31} + b_{i31})year_{ij} + \beta_{32}drug_{i} + \beta_{33}year_{ij}drug_{i} & \text{(L3)}\ \lambda_{i1}(t)&=\lambda_{01}(t)\ \textrm{exp}\left(\gamma_{11}drug_i + \varphi_{11}\eta_{i1}(t) + \varphi_{12}(b_{i20} + b_{i21} t) + \varphi_{13}\eta_{i3}(t) + \varphi_{14}\eta_{i3}'(t)\right) & \text{(S1)}\ \lambda_{i2}(t)&=\lambda_{02}(t)\ \textrm{exp}\left(\gamma_{21}drug_i + \varphi_{21}\eta_{i1}(t) + \varphi_{22}\eta_{i3}'(t)\right) & \text{(S2)} \end{array} \right.$$

M7 <- joint(formLong = list(serBilir ~ year * drug + sex  +  (1|id),
                            platelets ~ year + f1(year) + drug  +  sex + (1|id),
                            albumin ~ year + f1(year) + f2(year) + drug + (1|id)),
            formSurv = list(inla.surv(years, death) ~ sex + drug,
                            inla.surv(years, trans) ~ edema * sex),
            dataLong = Longi, dataSurv=Surv, id = "id", corLong=TRUE, timeVar = "year", 
            family = c("lognormal", "poisson", "gaussian"), basRisk = c("rw1", "rw1"),
            assoc = list(c("CV", "CV"), c("SRE", ""), c("CV_CS", "CS")),
            control=list(int.strategy="eb"))
summary(M7)

The longitudinal markers are assumed correlated but it is also possible to set corLong to FALSE to have independent random effects across markers and reduce the number of covariance parameters.

Model 8: Multi-state model

For the multi-state model, we need to define a survival model for each transition. Here for example in the case of illness-death we have three possible transitions: 1->2, 1->3 and 2->3.

The model structure is given by the following equation:

$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r } \lambda_{i,12}(t)&=\lambda_{0,12}(t)\ \textrm{exp}\left(\gamma_{12}X_i\right) & \text{(S1)}\ \lambda_{i,13}(t)&=\lambda_{0,13}(t)\ \textrm{exp}\left(\gamma_{13}X_i\right) & \text{(S2)}\ \lambda_{i,23}(t)&=\lambda_{0,23}(t)\ \textrm{exp}\left(\gamma_{23}X_i\right) & \text{(S3)} \end{array} \right.$$

Note that the baseline risk function is transition-specific (one baseline risk for each survival model). The dataset for this application is simulated and included in INLAjoint.

# set up outcomes for each transition
data(SurvMS) # load small simulated dataset for multi-state
E12 <- inla.surv(time = SurvMS[[1]]$Tstop, event = SurvMS[[1]]$status) # transition 1->2
E13 <- inla.surv(time = SurvMS[[2]]$Tstop, event = SurvMS[[2]]$status) # transition 1->3
E23 <- inla.surv(time = SurvMS[[3]]$Tstop, truncation=SurvMS[[3]]$Tstart, 
                 event =SurvMS[[3]]$status) # transition 2->3

M8 <- joint(formSurv=list(E12 ~ X, E13 ~ X, E23 ~ X), 
            basRisk = c("rw2", "rw1", "exponentialsurv"), dataSurv = SurvMS)
summary(M8)

Model 9: Joint longitudinal and multi-state model

We can extend the previous model to joint longitudinal and multi-state. The model structure is given by the following equation:

$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r } Y_{ij} &= \eta_{i}(t_{ij}) + \varepsilon_{ij} = \beta_{0} + b_{i0} + (\beta_{1} + b_{i1})time_{ij} + \beta_{12}X_{i} + \varepsilon_{ij} & \text{(L1)} \ \lambda_{i,12}(t)&=\lambda_{0,12}(t)\ \textrm{exp}\left(\gamma_{12}X_i + \varphi_{12}\eta_{i}(t)\right) & \text{(S1)}\ \lambda_{i,13}(t)&=\lambda_{0,13}(t)\ \textrm{exp}\left(\gamma_{13}X_i + \varphi_{13}\eta_{i}(t)\right) & \text{(S2)}\ \lambda_{i,23}(t)&=\lambda_{0,23}(t)\ \textrm{exp}\left(\gamma_{23}X_i + \varphi_{23}\eta_{i}(t)\right) & \text{(S3)} \end{array} \right.$$

data(LongMS) # load longitudinal data for joint longitudinal and multi-state
M9 <- joint(formSurv=list(E12 ~ X, E13 ~ X, E23 ~ X), 
            formLong=list(y ~ time + X + (1+time|id)),
            basRisk = c("rw2", "rw1", "exponentialsurv"), timeVar = "time", 
            assoc = list(c("CV", "CV", "CV")), id="id", 
            dataSurv = SurvMS, dataLong = LongMS)
summary(M9)

Model 10: Mixture cure model

The mixture cure model is used to analyze survival data with a cure fraction. The cured patients are subject to no excess risk and the uncured patients are subject to excess risk modeled using a parametric survival distribution. Let p(C=1) denote the probability of being cured and $\lambda_{i}(t|C=0)$ the survival for the part of the population that is not cured, the mixture cure model is defined by:

$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r } \textrm{Logit}(p(C=1)) &= \beta_{0} + \beta_{1}X_{i} & \text{(probability of being cured)} \ \lambda_{i}(t|C=0)&=\lambda_{0}(t)\ \textrm{exp}\left(\gamma X_i\right) & \text{(survival for non-cured)} \end{array} \right.$$

\textit{Note: the survival function for cured patients is always equal to 1 and the limit of the population survival $S(t)$ converges to $p(C=1)$ when $t\rightarrow\infty$.}

The mixture cure model can be added to any survival model and can deal with left, right and interval censoring. However, only parametric baseline risks can be defined at the moment for mixture cure models and the covariates that affect the probability of being cured can be associated to fixed effects only (i.e., no random effects in the logistic part).

data("bmt", package="smcure")
M10 <- joint(formSurv=inla.surv(time = bmt$Time, event = bmt$Status, 
                                cure=cbind("Int"=1, "TRT"=bmt$TRT)) ~ TRT, 
             basRisk = "weibullsurv", dataSurv = bmt)
summary(M10)

Model 11: Shared frailty model for recurrent events

The frailty model can fit recurrent events with a gaussian distribution for the frailty term (i.e., lognormal frailty model).

$$\lambda_{i}(t)=\lambda_{0}(t)\ \textrm{exp}\left(\gamma_{1}Sex_i + \gamma_{2}Age_i + \omega_i\right)$$ $$\omega_i \sim \mathcal{N}(\mu, \sigma^2)$$

library(survival)
M11 <- joint(formSurv=inla.surv(time, status) ~ sex + age + (1|id), id="id", dataSurv = kidney)
summary(M11)

We can compare the results with the coxph function from survival package:

M11s <- coxph(Surv(time, status) ~ sex + age + frailty.gaussian(id), data = kidney)
summary(M11s)

Model 12: Joint model for recurrent events and a terminal event

The joint frailty model deals with recurrent events with a frailty model and shares the frailty term in a proportional hazards model for a terminal event. The parameter $\varphi_1$ is scaling the frailty term in the terminal event submodel.

$$\left{\begin{array}{ c @{{}{}} l @{{}{}} r }\lambda_{i1}(t)=\lambda_{01}(t)\ \textrm{exp}\left(\gamma_{11}Chemo_i + \omega_i \right) & \text{(S1)}\ \lambda_{i2}(t)=\lambda_{02}(t) \textrm{exp}\left(\gamma_{12}Chemo_i + \varphi_{1}\omega_i \right) & \text{(S2)}\ \end{array}\right. $$

$$\omega_i \sim log\mathcal{N}(\mu, \sigma^2)$$

data(readmission, package="frailtypack")
terminalData <- readmission[readmission$event==0,]
terminalEVENT <- inla.surv(time = terminalData$t.stop, event = terminalData$death)
recurrentEVENT <- inla.surv(time = readmission$t.stop, event = readmission$event)
M12 <- joint(formSurv=list(recurrentEVENT ~ chemo + (1|id),
                           terminalEVENT ~ chemo), id="id",
             basRisk=c("rw1", "rw1"), assocSurv=TRUE,
             dataSurv = list(readmission,terminalData))
summary(M12)

The parameter assocSurv indicates that the frailty term from the recurrent event model is shared in the terminal event model.

Model 13: Two-part model for a longitudinal semicontinuous outcome

The two-part model is used to fit a semicontinuous outcome, usually zero-inflated but a point mass other than zero can also be handled. It includes a logistic mixed effects model for a binary outcome (zero vs. positive) and a linear mixed effects model for the positive-only values (can be a Poisson model for count or any other distribution available). We illustrate with an example based on PBC by making a fake zero-inflated outcome based on the albumin outcome.

The model structure is given by:

$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r }\textrm{Logit}(Prob(albumin_{ij}>0)) &= \beta_{10} + b_{i10} + (\beta_{11} + b_{i11})year_{ij} + \beta_{12}drug_{i} + \beta_{13}year_{ij}drug_{i} & \text{(L1)}\ \textrm{E}[albumin_{ij}|albumin_{ij}>0] &= \beta_{20} + b_{i20} + (\beta_{21} + b_{i21})year_{ij} + \beta_{22}drug_{i} + \beta_{23}year_{ij}drug_{i} + \varepsilon_{ij} & \text{(L2)}\ \end{array} \right.$$

# make a zero-inflated continuous outcome for illustration
Longi$binary <- ifelse(Longi$albumin<3, 0, 1)
# dataset with only positives for the continuous part
LongiPositive <- Longi[which(Longi$albumin>=3),]
M13 <- joint(formLong = list(binary ~ year * drug +  (1+year|id),
                             albumin ~ year * drug + (1+year|id)),
             dataLong = list(Longi, LongiPositive), timeVar="year", 
             id = "id", family = c("binomial", "gaussian"), corLong=TRUE)
summary(M13)

Model 14: Two-part joint model for a longitudinal semicontinuous outcome and a terminal event

We can fit a joint two-part model by adding a survival component to the previous model. The association is based on shared random effects (each random effect is shared in the survival submodel and associated to a scaling parameter). Simulation studies for this model are available at https://arxiv.org/abs/2010.13704

The model structure is given by:

M14 <- joint(formLong = list(binary ~ year * drug +  (1+year|id),
                             albumin ~ year * drug + (1+year|id)),
             formSurv = inla.surv(years, death) ~ drug, dataSurv=Surv,
             dataLong = list(Longi, LongiPositive), timeVar="year", 
             id = "id", corLong=TRUE, assoc = c("SRE_ind", "SRE_ind"),
             family = c("binomial", "gaussian"))
summary(M14)

Model 15: Two-part joint model for a longitudinal semicontinuous outcome, recurrent events and a terminal event

The model includes a two-part model for a semicontinuous longitudinal biomarker, a frailty model for recurrent events and a proportional hazards model for a terminal event. The random effects from the two mixed effects submodels for the longitudinal semicontinuous outcome are shared and scaled in the frailty model for recurrent events and the proportional hazards model for a terminal event. The frailty term is shared and scaled in the proportional hazards model. We use the colorectal and colorectalLongi datasets where the longitudinal semicontinuous biomarker is the tumor size (i.e., SLD of target lesions) in a colorectal cancer clinical trial, the recurrent events are new tumors and the terminal event is death.

$$\left{\begin{array}{ c @{{}{}} l @{{}{}} r }\textrm{Logit}(Prob(Y_{ij}>0)) &= \beta_{10} + b_{i10} + \beta_{11}year_{ij} + \beta_{12}drug_{i} + \beta_{13}year_{ij}drug_{i} & \text{(L1)}\ \textrm{E}[Y_{ij}|Y_{ij}>0] &= \beta_{20} + b_{i20} + (\beta_{21} + b_{i21})year_{ij} + \beta_{22}drug_{i} + \beta_{23}year_{ij}drug_{i} + \varepsilon_{ij} & \text{(L2)}\ \lambda_{i1}(t) &=\lambda_{01}(t)\ \textrm{exp}\left(\gamma_{1}Chemo_i + \varphi_{11}b_{i10} + \varphi_{12}b_{i20} + \varphi_{13}b_{i21} + \omega_i \right) & \text{(S1)}\ \lambda_{i2}(t) &=\lambda_{02}(t)\ \textrm{exp}\left(\gamma_{2}Chemo_i + \varphi_{21}b_{i10} + \varphi_{22}b_{i20} + \varphi_{23}b_{i21} + \varphi_{24}\omega_i\right) & \text{(S2)}\ \end{array}\right. $$

$$\omega_i \sim \mathcal{N}(\mu, \sigma^2)$$

data(colorectal, package="frailtypack")
data(colorectalLongi, package="frailtypack")
summary(colorectalLongi)
# reverse Box-Cox transformation to have original
# tumor size measurements (zero inflated and right skewed)
# because tumor.size in the dataset is transformed
colorectalLongi$CONTINUOUS <- round((colorectalLongi$tumor.size*0.3+1)^(1/0.3), 5)
# positive value indicator (binary outcome)
colorectalLongi$BINARY <- ifelse(colorectalLongi$CONTINUOUS==0,0,1)
# extract terminal event data
colorectalSurv <- subset(colorectal, new.lesions == 0)
# extract longitudinal positive data
colorectalLongiPositive <- colorectalLongi[colorectalLongi$CONTINUOUS>0,]

RECURRENT <- inla.surv(time=colorectal$gap.time, event=colorectal$new.lesions)
TERMINAL <- inla.surv(time=colorectalSurv$time1, event=colorectalSurv$state)

M15 <- joint(formSurv=list(RECURRENT ~ treatment + (1|id),
                           TERMINAL ~ treatment),
             formLong=list(BINARY ~ year * treatment + (1|id),
                           CONTINUOUS ~ year * treatment + (1+year|id)),
             dataSurv = list(colorectal, colorectalSurv),
             dataLong = list(colorectalLongi, colorectalLongiPositive),
             id="id", timeVar="year", basRisk=c("rw2", "rw2"), corLong=TRUE,
             family=c("binomial", "lognormal"), assocSurv=TRUE,
             assoc=list(c("SRE_ind", "SRE_ind"), c("SRE_ind", "SRE_ind")))
summary(M15)

Model 16: Joint model for multivariate longitudinal data and competing risks of event (application section of https://arxiv.org/abs/2203.06256)

The model structure is given by the following equation:

$$\left{ \begin{array}{ l @{{}{}} l @{{}{}} r } \log(serBilir_{ij}) &= \eta_{i1}(t_{ij}) + \varepsilon_{ij1} & \text{(L1)}\ &=(\beta_{10} + b_{i10})+\beta_{11}X_{i}+(\beta_{12}+b_{i11})\textrm{NS}1(t{ij})+(\beta_{13}+b_{i12})\textrm{NS}2(t{ij})\ & \ \ \ + (\beta_{14}+b_{i13})\textrm{NS}3(t{ij})+\beta_{15}X_{i}\textrm{NS}1(t{ij})+\beta_{16}X_{i}\textrm{NS}2(t{ij})+\beta_{17}X_{i}\textrm{NS}3(t{ij}) + \varepsilon_{ij1} \ \log(SGOT_{ij}) &= \eta_{i2}(t_{ij}) + \varepsilon_{ij2} & \text{(L2)}\ &= (\beta_{20} + b_{i20})+\beta_{21}X_{i}+(\beta_{22}+b_{i21})\textrm{NS}1(t{ij})+(\beta_{23}+b_{i22})\textrm{NS}2(t{ij})\ & \ \ \ + (\beta_{24}+b_{i23})\textrm{NS}3(t{ij})+\beta_{25}X_{i}\textrm{NS}1(t{ij})+\beta_{26}X_{i}\textrm{NS}2(t{ij})+\beta_{27}X_{i}\textrm{NS}3(t{ij}) + \varepsilon_{ij2} \ albumin_{ij}&=\eta_{i3}(t_{ij}) + \varepsilon_{ij3} & \text{(L3)}\ &= (\beta_{30} + b_{i30})+\beta_{31}X_{i}+(\beta_{32}+b_{i31})t+\beta_{33}X_{i}t+ \varepsilon_{ij3}\ \log(\textrm{E}[platelets_{ij}])&=\eta_{i4}(t_{ij}) & \text{(L4)}\ &= (\beta_{40} + b_{i40})+\beta_{41}X_{i}+(\beta_{42}+b_{i41})\textrm{NS}1(t{ij})+(\beta_{43}+b_{i42})\textrm{NS}2(t{ij})\ & \ \ \ + (\beta_{44}+b_{i43})\textrm{NS}3(t{ij})+\beta_{45}X_{i}\textrm{NS}1(t{ij})+\beta_{46}X_{i}\textrm{NS}2(t{ij})+\beta_{47}X_{i}\textrm{NS}3(t{ij}) \ \textrm{logit}(\textrm{E}[spiders_{ij}])&=\eta_{i5}(t_{ij}) & \text{(L5)}\ &= (\beta_{50} +b_{i50})+\beta_{51}X_{i}+(\beta_{52}+b_{i51})t+\beta_{53}X_{i}t \ \lambda_{i1}(t)&=\lambda_{01}(t)\ \textrm{exp}\left(\gamma_1 X_i + \varphi_1\eta_{i1}(t) + \varphi_3\eta_{i1}'(t) + \varphi_4\eta_{i2}(t) \right. & \text{(S1)}\ & \ \ \ + \left.\varphi_5\eta_{i3}(t) + \varphi_7\eta_{i4}(t) + \varphi_9\eta_{i5}(t) \right) \ \lambda_{i2}(t)&=\lambda_{02}(t)\ \textrm{exp}\left(\gamma_2 X_i + \varphi_2\eta_{i1}(t) + \varphi_6\eta_{i3}(t) + \varphi_8\eta_{i4}(t) \right) & \text{(S2)} \end{array} \right.$$

where $\textrm{NS}_1(t), \textrm{NS}_2(t), \textrm{NS}_3(t)$ are the natural cubic splines with internal knots at 1 and 4 years. We assume independent random effects between longitudinal markers. For computational convenience, here we fix the variance and covariance of all the random effects, so the model can run in a reasonable computation time on any computer. This is done by the use of the control argument 'initVC' to set initial values for variance and covariance terms (which can also be replaced by 'initSD' to set initial values for standard deviations and correlations instead) and the control argument 'fixRE' which is a boolean that allows to fix the values of random effects variances and covariances (or standard deviations and correlations). Note that removing 'initVC' and 'initRE' from control arguments will estimate the full model.

# set up natural cubic splines for longitudinal markers's trajectories
Nsplines <- ns(Longi$year, knots=c(1,4))
f1 <- function(x) predict(Nsplines, x)[,1]
f2 <- function(x) predict(Nsplines, x)[,2]
f3 <- function(x) predict(Nsplines, x)[,3]

initVC <- list(c(1.04, 1.58, 2.43, 1.80, 0.32, 0.63, 0.63, 1.76, 0.89, 1.42),
              c(0.17, 0.21, 0.31, 0.22, -0.02, 0.02, 0.01, 0.17, 0.09, 0.15),
              c(0.13, 0.01, 0),
              c(0.15, 1.31, 4.99, 16.32, -0.04, -0.08, -0.06, -1.83, -3.70, 8.62),
              c(5.89, 0.14, -0.25))
fixRE <- list(T,T,T,T,T)

M16 <-joint(formSurv = list(inla.surv(years, death) ~ drug, 
                            inla.surv(years, trans) ~ drug),
            formLong = list(serBilir ~ (1 + f1(year) + f2(year) + f3(year)) * drug +
                              (1 + f1(year) + f2(year) + f3(year) | id),
                            SGOT ~ (1 + f1(year) + f2(year) + f3(year)) * drug +
                              (1 + f1(year) + f2(year) + f3(year) | id),
                            albumin ~ (1 + year) * drug + (1 + year | id),
                            platelets ~ (1 + f1(year) + f2(year) + f3(year)) * drug +
                              (1 + f1(year) + f2(year) + f3(year) | id),
                            spiders ~ (1 + year) * drug + (1 + year | id)),
            dataLong = Longi, dataSurv=Surv, id = "id", timeVar = "year",
            family = c("lognormal", "lognormal", "gaussian", "poisson", "binomial"),
            basRisk = c("rw2", "rw1"), NbasRisk = 15, assoc = list(c("CV_CS", "CV"),
                            c("CV", ""), c("CV", "CV"), c("CV", "CV"), c("CV", "")),
            control=list(int.strategy="eb", initVC=initVC, fixRE=fixRE))
summary(M16)

DenisRustand/INLAjoint documentation built on April 13, 2025, 1:50 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DenisRustand/INLAjoint
Multivariate Joint Modeling for Longitudinal and Time-to-Event Outcomes with 'INLA'

In DenisRustand/INLAjoint: Multivariate Joint Modeling for Longitudinal and Time-to-Event Outcomes with 'INLA'

Datasets for illustrations

Mayo Clinic Primary Biliary Cirrhosis Data (pbc2)

Bone marrow transplant data (bmt)

Kidney catheter data (kidney)

Rehospitalization colorectal cancer (readmission)

FFCD 2000-05 phase III metastatic colorectal cancer clinical trial (colorectal and colorectalLongi)

Model 1: Mixed effects regression for a single longitudinal marker

Model 2: Multiple longitudinal markers with different distributions

Model 3: Proportional hazards survival model

Model 4: Longitudinal - survival joint model

Model 5: Comparison with MCMC

Model 6: Joint with one longitudinal and competing risks of event

Model 7: Joint with three longitudinal markers and competing risks of event

Model 8: Multi-state model

Model 9: Joint longitudinal and multi-state model

Model 10: Mixture cure model

Model 11: Shared frailty model for recurrent events

Model 12: Joint model for recurrent events and a terminal event

Model 13: Two-part model for a longitudinal semicontinuous outcome

Model 14: Two-part joint model for a longitudinal semicontinuous outcome and a terminal event

Model 15: Two-part joint model for a longitudinal semicontinuous outcome, recurrent events and a terminal event

Model 16: Joint model for multivariate longitudinal data and competing risks of event (application section of https://arxiv.org/abs/2203.06256)

R Package Documentation

Browse R Packages

We want your feedback!

DenisRustand/INLAjoint Multivariate Joint Modeling for Longitudinal and Time-to-Event Outcomes with 'INLA'

In DenisRustand/INLAjoint: Multivariate Joint Modeling for Longitudinal and Time-to-Event Outcomes with 'INLA'

Datasets for illustrations

Mayo Clinic Primary Biliary Cirrhosis Data (pbc2)

Bone marrow transplant data (bmt)

Kidney catheter data (kidney)

Rehospitalization colorectal cancer (readmission)

FFCD 2000-05 phase III metastatic colorectal cancer clinical trial (colorectal and colorectalLongi)

Model 1: Mixed effects regression for a single longitudinal marker

Model 2: Multiple longitudinal markers with different distributions

Model 3: Proportional hazards survival model

Model 4: Longitudinal - survival joint model

Model 5: Comparison with MCMC

Model 6: Joint with one longitudinal and competing risks of event

Model 7: Joint with three longitudinal markers and competing risks of event

Model 8: Multi-state model

Model 9: Joint longitudinal and multi-state model

Model 10: Mixture cure model

Model 11: Shared frailty model for recurrent events

Model 12: Joint model for recurrent events and a terminal event

Model 13: Two-part model for a longitudinal semicontinuous outcome

Model 14: Two-part joint model for a longitudinal semicontinuous outcome and a terminal event

Model 15: Two-part joint model for a longitudinal semicontinuous outcome, recurrent events and a terminal event

Model 16: Joint model for multivariate longitudinal data and competing risks of event (application section of https://arxiv.org/abs/2203.06256)

R Package Documentation

Browse R Packages

We want your feedback!

DenisRustand/INLAjoint
Multivariate Joint Modeling for Longitudinal and Time-to-Event Outcomes with 'INLA'