knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(INLA) library(INLAjoint) library(ggplot2) library(JM) # This package contains the dataset options(width=120)
\newpage
In this vignette we show how to fit various models with the joint() function of the INLAjoint
package. The package is providing a unified framework to fit various longitudinal, survival and joint models. This vignette illustrates some of these models and thus takes a long time to compute which is why the vignette is not compiled when installing INLAjoint but instead we provide a pre-compiled pdf.
Contact: INLAjoint@gmail.com
We use the data of the famous randomized clinical trial of Primary Biliary Cholangitis (PBC) patients where 312 PBC patients were followed at the Mayo Clinic between 1974 and 1988 and received either a placebo or D-penicillamine. These data are publicly available in several software including the R package "JM". During the follow-up, 140 patients died and 29 patients received a liver transplantation which we consider here as a competing event of death. In addition, repeated measures of various longitudinal markers potentially associated with the disease progression were collected. It is used to illustrate multiple models of the vignette (1, 2, 3, 4, 5, 6, 7, 13, 14 and 16).
data(pbc2) # dataset # extract some variable of interest without missing values Longi <- na.omit(pbc2[, c("id", "years", "status","drug","age", "sex","year","serBilir","SGOT", "albumin", "edema", "platelets", "alkaline","spiders", "ascites")]) Surv <- Longi[c(which(diff(as.numeric(Longi[,which(colnames(Longi)=="id")]))==1), length(Longi[,which(colnames(Longi)=="id")])),-c(7:10, 12:16)] Surv$death <- ifelse(Surv$status=="dead",1,0) # competing event 1 Surv$trans <- ifelse(Surv$status=="transplanted",1,0) # competing event 2
Bone marrow transplant study which is available in the package "smcure", it is used to illustrate the mixture cure model (model 10 of the vignette).
Recurrence times to infection for kidney patients using portable dialysis equipment, available in package "survival", it is used to illustrate the frailty model (model 11 of the vignette).
Rehospitalization times after surgery in patients diagnosed with colorectal cancer, it is available in the package "frailtypack" and it is used to illustrate the joint frailty model for recurrent events and a terminal event (model 12 of the vignette).
There are two complementary datasets available in the package "frailtypack". colorectal
includes times of new lesions appearance and time of death of 150 randomly chosen patients from the FFCD 2000-05 multicenter clinical trial. colorectalLongi
contains the corresponding longitudinal measurements of tumors size. It is used to illustrate the two-part joint model for a longitudinal semicontinuous biomarker, recurrent events and a terminal event (model 15 of the vignette).
This first model shows how to call the joint() function for a simple linear mixed effects model for a longitudinal marker, it gives the basic structure of the function. The required arguments are:
formLong
: formula for the model with the lme4 structure (including random effects in the formula as: (NAME | ID)).
dataLong
: Dataset that must contain the variables given in the formula.
id
: Name of the variable for grouping (e.g., individuals).
timeVar
: Name of the time variable.
family
: Distribution of the outcome (e.g., gaussian, poisson, binomial).
The model structure is given by the following equation:
$$\log(serBilir_{ij}) = \beta_0 + b_{i0} + \beta_1year_{ij} + \beta_2drug_{i} + \varepsilon_{ij} \hspace{1cm} \text{(L1)}$$
where $\beta$ are the fixed effects, $b_{i0}$ is an individual random intercept and $\varepsilon_{ij}$ is the residual error term.
M1 <- joint(formLong = serBilir ~ year + drug + (1|id), dataLong = Longi, id = "id", timeVar = "year", family = "lognormal")
The summary statistics are available from the summary
function:
summary(M1)
If one wishes to get the standard deviations instead of variance parameters, it is possible to switch with the sdcor
argument of the summary
function:
summary(M1, sdcor=TRUE)
The log marginal-likelihood, the Deviance Information Criterion (DIC) and the Widely Applicable Bayesian Information Criterion (WAIC) are provided in the summary statistics.
The control
argument in the joint
function has the following components:
int.strategy
allows to choose the strategy for the numerical integration used to approximate the marginal posterior distributions of the latent field. Available options are "ccd" (default), "grid" or "eb" (empirical Bayes). The empirical Bayes uses only the mode of the approximations for the integration, which speed up and simplifies computations.
priorFixed
allows to set the mean and standard deviation of the Gaussian prior for the fixed effects.
priorAssoc
allows to set the mean and standard deviation of the Gaussian prior for the association parameters between the longitudinal and survival submodels.
cpo
set to TRUE to compute the Conditional Predictive Ordinate.
An useful function to learn about the priors used in a fitted model is inla.priors.used
, applied to an object fitted with the joint
function. For example the default priors for fixed effects are Gaussian with mean zero and variance 100 (i.e., precision 0.01).
inla.priors.used(M1)
The full list of the arguments is available in the help documentation of the joint
function which can be accessed by running ?joint
.
The following code fits a joint model with 3 longitudinal markers including fixed effects for covariates such as sex, drug and interactions with time. We assume random intercept and random slope for each longitudinal trajectory. Note that the formLong
argument is now a list of formulas, one for each longitudinal marker and the length of family must match the number of markers.
The model structure is given by the following equation:
$$\left{ \begin{array}{ l @{{}{}} l @{{}{}} r }\log(serBilir_{ij}) &= \beta_{10} + b_{i10} + (\beta_{11} + b_{i11})year_{ij} + \beta_{12}drug_{i} + \beta_{13}sex_i + \beta_{14}year_{ij}drug_{i} + \varepsilon_{ij1} & \text{(L1)}\ \log(E[platelets_{ij}]) &= \beta_{20} + b_{i20} + (\beta_{21} + b_{i21})year_{ij} + \beta_{22}sex_i + \beta_{23}drug_{i} + \beta_{24}year_{ij}sex_{i} & \text{(L2)}\ \text{logit}(E[spiders_{ij}]) &= \beta_{30} + b_{i30} + (\beta_{31} + b_{i31})year_{ij} + \beta_{32}drug_{i} & \text{(L3)} \end{array} \right.$$
M2 <- joint(formLong = list(serBilir ~ year * drug + sex + (1+year|id), platelets ~ year * sex + drug + (1+year|id), spiders ~ year + drug + (1+year| id)), dataLong = Longi, id = "id", timeVar="year", corLong=TRUE, family = c("lognormal", "poisson", "binomial"), control=list(int.strategy="eb")) summary(M2)
The additional boolean argument corLong
is set to TRUE in order to have correlation between the random effects across the longitudinal markers. Therefore by switching this argument to TRUE, instead of having 3 sets of two correlated random effects, we have 1 set of 6 correlated random effects.
We can also get the standard deviation and correlation of random parameters instead of variance and covariance by adding sdcor=TRUE
to the summary function call:
summary(M2, sdcor=TRUE)
The link functions between the linear predictors and the longitudinal outcomes are set to default, it is however possible to switch to alternative ones using the link
argument, e.g., to switch from logit
to probit
for the binary marker:
M2 <- joint(formLong = list(serBilir ~ year * drug + sex + (1+year|id), platelets ~ year * sex + drug + (1+year|id), spiders ~ year + drug + (1+year| id)), dataLong = Longi, id = "id", timeVar="year", corLong=TRUE, family = c("lognormal", "poisson", "binomial"), link = c("default", "default", "probit"), control=list(int.strategy="eb")) summary(M2)
Some additional arguments are introduced to fit a survival model:
formSurv
: formula for the time-to-event outcome, with the response given as an inla.surv() object.
dataSurv
: dataset that must contain the variables given in the formula. When fitting a joint model with a longitudinal component, if dataSurv is not provided, the longitudinal dataset is used to get the covariates values included in the time-to-event formula.
basRisk
: the baseline risk of event. It can be defined as parametric with either "exponentialsurv" for exponential baseline or "weibullsurv" for Weibull baseline (note that there are two formulations of the Weibull distribution, see inla.doc("weibull")
for more details, default is variant = 0). Alternatively, there are two options to avoid parametric assumptions on the shape of the baseline risk: "rw1" for random walks of order one prior that corresponds to a smooth spline function based on first order differences. The second option "rw2" assigns a random walk order two prior that corresponds to a smooth spline function based on second order differences. This second option provides a smoother spline compared to order one since the smoothing is then done on the second order. The number of bins that define the intervals for the baseline risk can be specified with NbasRisk
(default 15 bins).
The model is defined as: $$\lambda_{i}(t)=\lambda_{0}(t)\ \textrm{exp}\left(\gamma_1drug_i + \gamma_2sex_i +\gamma_3drug_isex_i\right)$$
M3 <- joint(formSurv = inla.surv(time = years, event = death) ~ drug * sex, dataSurv = Surv) summary(M3)
Note that the parameter associated to the baseline risk of event "Baseline risk (variance)" corresponds to the variance parameter associated to the random walk to capture the evolution over time. Summary statistics of the random walks baseline risks are available in the "BaselineValues" attribute of an INLAjoint summary object.
summary(M3)$BaselineValues
It is possible to convert summary statistics to have hazards ratios instead of the mean parameter values in the linear predictor (does not affect baseline parameters).
summary(M3, hazr=TRUE)
Results are similar to the coxph function fit from survival:
survival::coxph(Surv(years, death) ~ drug * sex, data = Surv)
While the default baseline risk is set to first order random walk with 15 intervals, we can easily switch to parametric Weibull:
M3_wei <- joint(formSurv = inla.surv(time = years, event = death) ~ drug * sex, dataSurv = Surv, basRisk = "weibullsurv") summary(M3_wei, hazr=T)
An additional argument is required to set up the association between the longitudinal and survival parts:
assoc
: a character string that specifies the association between the longitudinal and survival components. The available options are "CV" for sharing the current value of the linear predictor, "CS" for the current slope, "CV_CS" for the current value and the current slope, "SRE" for shared random effects (i.e., sharing the individual deviation from the mean at time t as defined by the random effects), "SRE_ind" for shared random effect independent (each random effect's individual deviation is associated to an association parameter in the survival submodel) and "" (empty string) for no association.The model structure is given by the following equation:
$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r }log(serBilir_{ij}) &= \eta_{i}(t_{ij}) + \varepsilon_{ij} & \text{(L1)}\ &= \beta_0 + \beta_1year_{ij} + (\beta_2 + b_{i2})year_{ij}^2 \ & \ \ \ + (\beta_3 + b_{i3})year_{ij}^3 + \beta_4drug_{i} + \beta_5year_{ij}drug_{i}\ & \ \ \ + \beta_6year_{ij}^2drug_{i} + \beta_7year_{ij}^3drug_{i} + \varepsilon_{ij}\ \lambda_{i1}(t)&=\lambda_{01}(t)\ \textrm{exp}\left(\gamma_1drug_i + \varphi_1\eta_{i}(t) + \varphi_2\eta_{i}'(t)\right) & \text{(S1)} \end{array} \right.$$
where $\gamma$ denotes fixed effects of the survival part and $\varphi$ the association parameters.
# Set up quadratic and cubic functions of time f1 <- function(x) x^2 f2 <- function(x) x^3 M4 <- joint(formSurv = inla.surv(years, death) ~ drug, formLong = serBilir ~ (1 + year + f1(year) + f2(year))*drug + (f1(year) + f2(year) |id), family = "lognormal", dataLong = Longi, dataSurv=Surv, id = "id", timeVar = "year", assoc = "CV_CS", basRisk = "rw2", NbasRisk=25, control=list(int.strategy="eb")) summary(M4)
In case some functions of time should be included, they must be set as illustrated in the above example, i.e., create a univariate function of x named f1, f2, ..., fN, and use this function in the formula (where N=20 is the maximum number of functions available at the moment but could be increased if needed.). This is important to be able to compute the value of the linear predictor at any time t, particularly for the time-dependent association structures. A numerical approximation of the derivative of the function is automatically computed in case the current slope of the linear predictor is shared in the survival submodel.
We can plot the posterior distribution for all the parameters with the plot
function
plotM4 <- plot(M4, sdcor=T)
The plot function returns multiple plots for each component of the model. First the plots for the longitudinal outcome(s) parameters:
plotM4$Outcomes$L1
Then the parameters of the survival outcome(s):
plotM4$Outcomes$S1
The variance-covariance of the random-effects (converted to standard deviations and correlations when argument sdcor=TRUE
is added to the call of the plot function):
plotM4$Covariances
The posterior distributions of the association parameters:
plotM4$Associations
And finally the curve for the baseline risk functions:
plotM4$Baseline
The model for the baseline risk is a random walk with number of bins given by argument NbasRisk
, the curve plotted is constant between the bins but converges towards a smooth spline when the number of bins increase. Sometimes the scale for the baseline hazard risk may require to have a log10 y-axis, this can easily be done using to the ggplot2
framework. Moreover, the data associated to each plot is available in the object that contains the result of the plot function call (i.e., plotM4
in the example).
plotM4$Baseline + scale_y_log10()
We can make a comparison of INLAjoint
with Bayesian estimations with MCMC implemented in alternative R packages such as JMBayes2
(C++ implementation of MCMC) or rstanarm
(Stan).
We propose a comparison for a simple joint model with one longitudinal and one survival component:
$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r }albumin_{ij} &= \eta_{i}(t_{ij}) + \varepsilon_{ij} & \text{(L1)}\ &= \beta_0 + b_{i0} + (\beta_1 + b_{i1})year_{ij} + \beta_2drug_{i} + \beta_3year_{ij}drug_{i} + \varepsilon_{ij}\ \lambda_{i}(t)&=\lambda_{0}(t)\ \textrm{exp}\left(\gamma_{1}sex_i + \gamma_{2}drug_i + \varphi\eta_{i}(t)\right) & \text{(S1)} \end{array} \right.$$
# INLAjoint M5 <- joint(formSurv = inla.surv(years, death) ~ sex + drug, formLong = albumin ~ (1 + year)*drug + (1 + year |id), dataLong = Longi, dataSurv=Surv, id = "id", timeVar = "year", assoc = "CV", control=list(priorFixed=list(mean=0, prec=0.16, mean.intercept=0, prec.intercept=0.16), priorAssoc=list(mean=0, prec=0.16))) summary(M5)
Here the prior distributions of the fixed effects and association parameters are changed to have precision 0.16 (i.e., variance 6.25 instead of the default value of 100), in order to match the default prior distributions of rstanarm
for the fixed effects and association parameters.
# JMBayes2 library(JMbayes2) M5JMB2_lme <- lme(albumin ~ (1 + year)*drug, random = ~ 1 + year |id, data = Longi) M5JMB2_cox <- coxph(Surv(Surv$years, Surv$death) ~ sex + drug, data = Surv, x = TRUE) JMpr = list(mean_alphas=list(0), Tau_alphas = list(matrix(0.16))) M5JMB2 <- try(jm(M5JMB2_cox, list(M5JMB2_lme), time_var = "year", priors=JMpr)) # Computation time in the table includes LME + Cox + JM summary(M5JMB2)
# rstanarm library(rstanarm) library(survival) options(mc.cores = parallel::detectCores()) M5rstanarm <- stan_jm( formulaLong = list(albumin ~ (1 + year)*drug + (1 + year |id)), formulaEvent = Surv(years, death) ~ sex + drug, dataLong = Longi, dataEvent = Surv, time_var = "year", priorLong_intercept = normal(0, 2.5, autoscale=TRUE), priorLong = normal(0, 2.5), priorEvent_assoc = normal(0, 2.5), seed = 12345) summary(M5rstanarm)
tabl <- " | Package | INLAjoint | JMbayes2 | rstanarm | |------------|-----------|-----------|-----------| | algorithm | INLA | MCMC C++ | MCMC Stan | | comp. time | 7 sec. | 1736 sec. | 596 sec. | " cat(tabl) # output the table in a format good for HTML/PDF/docx conversion
A more detailed comparison between INLA, MCEM and MCMC is available at https://arxiv.org/abs/2203.06256 and a comparison between INLA and Levenberg-Marquardt algorithm (Newton-Raphson like that performs MLE) is available at https://arxiv.org/abs/2010.13704
In order to account for competing risks of event, the formSurv
argument is given as a list with one element for each risk submodel. Moreover, the basRisk
argument must be a vector with the same number of elements as the number of survival submodels.
The model structure is given by the following equation:
$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r }\log(serBilir_{ij}) &= \eta_{i}(t_{ij}) + \varepsilon_{ij} & \text{(L1)}\ &= \beta_0 + b_{i0} + (\beta_1 + b_{i1})year_{ij} + \beta_2drug_{i} + \beta_3sex_i \ & \ \ \ + \beta_4year_{ij}drug_{i} + \beta_5year_{ij}sex_{i} + \varepsilon_{ij}\ \lambda_{i1}^{death}(t)&=\lambda_{01}(t)\ \textrm{exp}\left(\gamma_{11}sex_i + \gamma_{12}drug_i + \varphi_{11}(b_{i0} + b_{i1}t)\right) & \text{(S1)}\ \lambda_{i2}^{transpl.}(t)&=\lambda_{02}(t)\ \textrm{exp}\left(\gamma_{21}edema_no_i + \gamma_{22}edema_de_i + \gamma_{23}sex_i \right.\ &\left. \ \ + \gamma_{24}edema_no_isex_i + \gamma_{25}edema_de_isex_i + \varphi_{21}b_{i0} + \varphi_{22}b_{i1}\right) & \text{(S2)} \end{array} \right.$$
M6 <- joint(formLong = serBilir ~ year * (drug + sex) + (1+year|id), dataLong = Longi, formSurv = list(inla.surv(years, death) ~ sex + drug, inla.surv(years, trans) ~ edema * sex), id = "id", timeVar = "year", family = "lognormal", basRisk = c("rw1", "rw1"), assoc = c("SRE", "SRE_ind"), dataSurv=Surv, control=list(int.strategy="eb")) summary(M6)
When multiple longitudinal submodels and survival submodels are included, the arguments formSurv
and formLong
are both given as lists. The assoc
parameter should then be a list with one element for each longitudinal submodel and each element is a vector for the association with each survival submodel.
The model structure is given by the following equation:
$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r }\log(serBilir_{ij}) &= \eta_{i1}(t_{ij}) + \varepsilon_{ij1} = \beta_{10} + b_{i10} + (\beta_{11} + b_{i11})year_{ij} + \beta_{12}drug_{i} + \beta_{13}sex_i & \text{(L1)} \ & \ \ \ + \beta_{14}year_{ij}drug_{i} + \varepsilon_{ij1}\ \log(E[platelets_{ij}]) &= \eta_{i2}(t_{ij}) = \beta_{20} + b_{i20} + (\beta_{21} + b_{i21})year_{ij} + \beta_{22}sex_i + \beta_{23}drug_{i} + \beta_{24}year_{ij}sex_{i} & \text{(L2)}\ \text{logit}(E[spiders_{ij}]) &= \eta_{i3}(t_{ij}) = \beta_{30} + b_{i30} + (\beta_{31} + b_{i31})year_{ij} + \beta_{32}drug_{i} + \beta_{33}year_{ij}drug_{i} & \text{(L3)}\ \lambda_{i1}(t)&=\lambda_{01}(t)\ \textrm{exp}\left(\gamma_{11}drug_i + \varphi_{11}\eta_{i1}(t) + \varphi_{12}(b_{i20} + b_{i21} t) + \varphi_{13}\eta_{i3}(t) + \varphi_{14}\eta_{i3}'(t)\right) & \text{(S1)}\ \lambda_{i2}(t)&=\lambda_{02}(t)\ \textrm{exp}\left(\gamma_{21}drug_i + \varphi_{21}\eta_{i1}(t) + \varphi_{22}\eta_{i3}'(t)\right) & \text{(S2)} \end{array} \right.$$
M7 <- joint(formLong = list(serBilir ~ year * drug + sex + (1|id), platelets ~ year + f1(year) + drug + sex + (1|id), albumin ~ year + f1(year) + f2(year) + drug + (1|id)), formSurv = list(inla.surv(years, death) ~ sex + drug, inla.surv(years, trans) ~ edema * sex), dataLong = Longi, dataSurv=Surv, id = "id", corLong=TRUE, timeVar = "year", family = c("lognormal", "poisson", "gaussian"), basRisk = c("rw1", "rw1"), assoc = list(c("CV", "CV"), c("SRE", ""), c("CV_CS", "CS")), control=list(int.strategy="eb")) summary(M7)
The longitudinal markers are assumed correlated but it is also possible to set corLong
to FALSE to have independent random effects across markers and reduce the number of covariance parameters.
For the multi-state model, we need to define a survival model for each transition. Here for example in the case of illness-death we have three possible transitions: 1->2, 1->3 and 2->3.
The model structure is given by the following equation:
$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r } \lambda_{i,12}(t)&=\lambda_{0,12}(t)\ \textrm{exp}\left(\gamma_{12}X_i\right) & \text{(S1)}\ \lambda_{i,13}(t)&=\lambda_{0,13}(t)\ \textrm{exp}\left(\gamma_{13}X_i\right) & \text{(S2)}\ \lambda_{i,23}(t)&=\lambda_{0,23}(t)\ \textrm{exp}\left(\gamma_{23}X_i\right) & \text{(S3)} \end{array} \right.$$
Note that the baseline risk function is transition-specific (one baseline risk for each survival model). The dataset for this application is simulated and included in INLAjoint.
# set up outcomes for each transition data(SurvMS) # load small simulated dataset for multi-state E12 <- inla.surv(time = SurvMS[[1]]$Tstop, event = SurvMS[[1]]$status) # transition 1->2 E13 <- inla.surv(time = SurvMS[[2]]$Tstop, event = SurvMS[[2]]$status) # transition 1->3 E23 <- inla.surv(time = SurvMS[[3]]$Tstop, truncation=SurvMS[[3]]$Tstart, event =SurvMS[[3]]$status) # transition 2->3 M8 <- joint(formSurv=list(E12 ~ X, E13 ~ X, E23 ~ X), basRisk = c("rw2", "rw1", "exponentialsurv"), dataSurv = SurvMS) summary(M8)
We can extend the previous model to joint longitudinal and multi-state. The model structure is given by the following equation:
$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r } Y_{ij} &= \eta_{i}(t_{ij}) + \varepsilon_{ij} = \beta_{0} + b_{i0} + (\beta_{1} + b_{i1})time_{ij} + \beta_{12}X_{i} + \varepsilon_{ij} & \text{(L1)} \ \lambda_{i,12}(t)&=\lambda_{0,12}(t)\ \textrm{exp}\left(\gamma_{12}X_i + \varphi_{12}\eta_{i}(t)\right) & \text{(S1)}\ \lambda_{i,13}(t)&=\lambda_{0,13}(t)\ \textrm{exp}\left(\gamma_{13}X_i + \varphi_{13}\eta_{i}(t)\right) & \text{(S2)}\ \lambda_{i,23}(t)&=\lambda_{0,23}(t)\ \textrm{exp}\left(\gamma_{23}X_i + \varphi_{23}\eta_{i}(t)\right) & \text{(S3)} \end{array} \right.$$
data(LongMS) # load longitudinal data for joint longitudinal and multi-state M9 <- joint(formSurv=list(E12 ~ X, E13 ~ X, E23 ~ X), formLong=list(y ~ time + X + (1+time|id)), basRisk = c("rw2", "rw1", "exponentialsurv"), timeVar = "time", assoc = list(c("CV", "CV", "CV")), id="id", dataSurv = SurvMS, dataLong = LongMS) summary(M9)
The mixture cure model is used to analyze survival data with a cure fraction. The cured patients are subject to no excess risk and the uncured patients are subject to excess risk modeled using a parametric survival distribution. Let p(C=1) denote the probability of being cured and $\lambda_{i}(t|C=0)$ the survival for the part of the population that is not cured, the mixture cure model is defined by:
$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r } \textrm{Logit}(p(C=1)) &= \beta_{0} + \beta_{1}X_{i} & \text{(probability of being cured)} \ \lambda_{i}(t|C=0)&=\lambda_{0}(t)\ \textrm{exp}\left(\gamma X_i\right) & \text{(survival for non-cured)} \end{array} \right.$$
\textit{Note: the survival function for cured patients is always equal to 1 and the limit of the population survival $S(t)$ converges to $p(C=1)$ when $t\rightarrow\infty$.}
The mixture cure model can be added to any survival model and can deal with left, right and interval censoring. However, only parametric baseline risks can be defined at the moment for mixture cure models and the covariates that affect the probability of being cured can be associated to fixed effects only (i.e., no random effects in the logistic part).
data("bmt", package="smcure") M10 <- joint(formSurv=inla.surv(time = bmt$Time, event = bmt$Status, cure=cbind("Int"=1, "TRT"=bmt$TRT)) ~ TRT, basRisk = "weibullsurv", dataSurv = bmt) summary(M10)
The frailty model can fit recurrent events with a gaussian distribution for the frailty term (i.e., lognormal frailty model).
$$\lambda_{i}(t)=\lambda_{0}(t)\ \textrm{exp}\left(\gamma_{1}Sex_i + \gamma_{2}Age_i + \omega_i\right)$$ $$\omega_i \sim \mathcal{N}(\mu, \sigma^2)$$
library(survival) M11 <- joint(formSurv=inla.surv(time, status) ~ sex + age + (1|id), id="id", dataSurv = kidney) summary(M11)
We can compare the results with the coxph function from survival package:
M11s <- coxph(Surv(time, status) ~ sex + age + frailty.gaussian(id), data = kidney) summary(M11s)
The joint frailty model deals with recurrent events with a frailty model and shares the frailty term in a proportional hazards model for a terminal event. The parameter $\varphi_1$ is scaling the frailty term in the terminal event submodel.
$$\left{\begin{array}{ c @{{}{}} l @{{}{}} r }\lambda_{i1}(t)=\lambda_{01}(t)\ \textrm{exp}\left(\gamma_{11}Chemo_i + \omega_i \right) & \text{(S1)}\ \lambda_{i2}(t)=\lambda_{02}(t) \textrm{exp}\left(\gamma_{12}Chemo_i + \varphi_{1}\omega_i \right) & \text{(S2)}\ \end{array}\right. $$
$$\omega_i \sim log\mathcal{N}(\mu, \sigma^2)$$
data(readmission, package="frailtypack") terminalData <- readmission[readmission$event==0,] terminalEVENT <- inla.surv(time = terminalData$t.stop, event = terminalData$death) recurrentEVENT <- inla.surv(time = readmission$t.stop, event = readmission$event) M12 <- joint(formSurv=list(recurrentEVENT ~ chemo + (1|id), terminalEVENT ~ chemo), id="id", basRisk=c("rw1", "rw1"), assocSurv=TRUE, dataSurv = list(readmission,terminalData)) summary(M12)
The parameter assocSurv
indicates that the frailty term from the recurrent event model is shared in the terminal event model.
The two-part model is used to fit a semicontinuous outcome, usually zero-inflated but a point mass other than zero can also be handled. It includes a logistic mixed effects model for a binary outcome (zero vs. positive) and a linear mixed effects model for the positive-only values (can be a Poisson model for count or any other distribution available). We illustrate with an example based on PBC by making a fake zero-inflated outcome based on the albumin outcome.
The model structure is given by:
$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r }\textrm{Logit}(Prob(albumin_{ij}>0)) &= \beta_{10} + b_{i10} + (\beta_{11} + b_{i11})year_{ij} + \beta_{12}drug_{i} + \beta_{13}year_{ij}drug_{i} & \text{(L1)}\ \textrm{E}[albumin_{ij}|albumin_{ij}>0] &= \beta_{20} + b_{i20} + (\beta_{21} + b_{i21})year_{ij} + \beta_{22}drug_{i} + \beta_{23}year_{ij}drug_{i} + \varepsilon_{ij} & \text{(L2)}\ \end{array} \right.$$
# make a zero-inflated continuous outcome for illustration Longi$binary <- ifelse(Longi$albumin<3, 0, 1) # dataset with only positives for the continuous part LongiPositive <- Longi[which(Longi$albumin>=3),] M13 <- joint(formLong = list(binary ~ year * drug + (1+year|id), albumin ~ year * drug + (1+year|id)), dataLong = list(Longi, LongiPositive), timeVar="year", id = "id", family = c("binomial", "gaussian"), corLong=TRUE) summary(M13)
We can fit a joint two-part model by adding a survival component to the previous model. The association is based on shared random effects (each random effect is shared in the survival submodel and associated to a scaling parameter). Simulation studies for this model are available at https://arxiv.org/abs/2010.13704
The model structure is given by:
$$\left{ \begin{array}{ c @{{}{}} l @{{}{}} r }\textrm{Logit}(Prob(albumin_{ij}>0)) &= \beta_{10} + b_{i10} + (\beta_{11} + b_{i11})year_{ij} + \beta_{12}drug_{i} + \beta_{13}year_{ij}drug_{i} & \text{(L1)}\ \textrm{E}[albumin_{ij}|albumin_{ij}>0] &= \beta_{20} + b_{i20} + (\beta_{21} + b_{i21})year_{ij} + \beta_{22}drug_{i} + \beta_{23}year_{ij}drug_{i} + \varepsilon_{ij} & \text{(L2)}\ \lambda_{i}(t)&=\lambda_{0}(t)\ \textrm{exp}\left(\gamma_{1}drug_i + \varphi_1b_{i10} + \varphi_2b_{i11} + \varphi_3b_{i20} + \varphi_4b_{i21}\right) & \text{(S1)} \end{array} \right.$$
M14 <- joint(formLong = list(binary ~ year * drug + (1+year|id), albumin ~ year * drug + (1+year|id)), formSurv = inla.surv(years, death) ~ drug, dataSurv=Surv, dataLong = list(Longi, LongiPositive), timeVar="year", id = "id", corLong=TRUE, assoc = c("SRE_ind", "SRE_ind"), family = c("binomial", "gaussian")) summary(M14)
The model includes a two-part model for a semicontinuous longitudinal biomarker, a frailty model for recurrent events and a proportional hazards model for a terminal event. The random effects from the two mixed effects submodels for the longitudinal semicontinuous outcome are shared and scaled in the frailty model for recurrent events and the proportional hazards model for a terminal event. The frailty term is shared and scaled in the proportional hazards model. We use the colorectal
and colorectalLongi
datasets where the longitudinal semicontinuous biomarker is the tumor size (i.e., SLD of target lesions) in a colorectal cancer clinical trial, the recurrent events are new tumors and the terminal event is death.
$$\left{\begin{array}{ c @{{}{}} l @{{}{}} r }\textrm{Logit}(Prob(Y_{ij}>0)) &= \beta_{10} + b_{i10} + \beta_{11}year_{ij} + \beta_{12}drug_{i} + \beta_{13}year_{ij}drug_{i} & \text{(L1)}\ \textrm{E}[Y_{ij}|Y_{ij}>0] &= \beta_{20} + b_{i20} + (\beta_{21} + b_{i21})year_{ij} + \beta_{22}drug_{i} + \beta_{23}year_{ij}drug_{i} + \varepsilon_{ij} & \text{(L2)}\ \lambda_{i1}(t) &=\lambda_{01}(t)\ \textrm{exp}\left(\gamma_{1}Chemo_i + \varphi_{11}b_{i10} + \varphi_{12}b_{i20} + \varphi_{13}b_{i21} + \omega_i \right) & \text{(S1)}\ \lambda_{i2}(t) &=\lambda_{02}(t)\ \textrm{exp}\left(\gamma_{2}Chemo_i + \varphi_{21}b_{i10} + \varphi_{22}b_{i20} + \varphi_{23}b_{i21} + \varphi_{24}\omega_i\right) & \text{(S2)}\ \end{array}\right. $$
$$\omega_i \sim \mathcal{N}(\mu, \sigma^2)$$
data(colorectal, package="frailtypack") data(colorectalLongi, package="frailtypack") summary(colorectalLongi) # reverse Box-Cox transformation to have original # tumor size measurements (zero inflated and right skewed) # because tumor.size in the dataset is transformed colorectalLongi$CONTINUOUS <- round((colorectalLongi$tumor.size*0.3+1)^(1/0.3), 5) # positive value indicator (binary outcome) colorectalLongi$BINARY <- ifelse(colorectalLongi$CONTINUOUS==0,0,1) # extract terminal event data colorectalSurv <- subset(colorectal, new.lesions == 0) # extract longitudinal positive data colorectalLongiPositive <- colorectalLongi[colorectalLongi$CONTINUOUS>0,] RECURRENT <- inla.surv(time=colorectal$gap.time, event=colorectal$new.lesions) TERMINAL <- inla.surv(time=colorectalSurv$time1, event=colorectalSurv$state) M15 <- joint(formSurv=list(RECURRENT ~ treatment + (1|id), TERMINAL ~ treatment), formLong=list(BINARY ~ year * treatment + (1|id), CONTINUOUS ~ year * treatment + (1+year|id)), dataSurv = list(colorectal, colorectalSurv), dataLong = list(colorectalLongi, colorectalLongiPositive), id="id", timeVar="year", basRisk=c("rw2", "rw2"), corLong=TRUE, family=c("binomial", "lognormal"), assocSurv=TRUE, assoc=list(c("SRE_ind", "SRE_ind"), c("SRE_ind", "SRE_ind"))) summary(M15)
The model structure is given by the following equation:
$$\left{ \begin{array}{ l @{{}{}} l @{{}{}} r } \log(serBilir_{ij}) &= \eta_{i1}(t_{ij}) + \varepsilon_{ij1} & \text{(L1)}\ &=(\beta_{10} + b_{i10})+\beta_{11}X_{i}+(\beta_{12}+b_{i11})\textrm{NS}1(t{ij})+(\beta_{13}+b_{i12})\textrm{NS}2(t{ij})\ & \ \ \ + (\beta_{14}+b_{i13})\textrm{NS}3(t{ij})+\beta_{15}X_{i}\textrm{NS}1(t{ij})+\beta_{16}X_{i}\textrm{NS}2(t{ij})+\beta_{17}X_{i}\textrm{NS}3(t{ij}) + \varepsilon_{ij1} \ \log(SGOT_{ij}) &= \eta_{i2}(t_{ij}) + \varepsilon_{ij2} & \text{(L2)}\ &= (\beta_{20} + b_{i20})+\beta_{21}X_{i}+(\beta_{22}+b_{i21})\textrm{NS}1(t{ij})+(\beta_{23}+b_{i22})\textrm{NS}2(t{ij})\ & \ \ \ + (\beta_{24}+b_{i23})\textrm{NS}3(t{ij})+\beta_{25}X_{i}\textrm{NS}1(t{ij})+\beta_{26}X_{i}\textrm{NS}2(t{ij})+\beta_{27}X_{i}\textrm{NS}3(t{ij}) + \varepsilon_{ij2} \ albumin_{ij}&=\eta_{i3}(t_{ij}) + \varepsilon_{ij3} & \text{(L3)}\ &= (\beta_{30} + b_{i30})+\beta_{31}X_{i}+(\beta_{32}+b_{i31})t+\beta_{33}X_{i}t+ \varepsilon_{ij3}\ \log(\textrm{E}[platelets_{ij}])&=\eta_{i4}(t_{ij}) & \text{(L4)}\ &= (\beta_{40} + b_{i40})+\beta_{41}X_{i}+(\beta_{42}+b_{i41})\textrm{NS}1(t{ij})+(\beta_{43}+b_{i42})\textrm{NS}2(t{ij})\ & \ \ \ + (\beta_{44}+b_{i43})\textrm{NS}3(t{ij})+\beta_{45}X_{i}\textrm{NS}1(t{ij})+\beta_{46}X_{i}\textrm{NS}2(t{ij})+\beta_{47}X_{i}\textrm{NS}3(t{ij}) \ \textrm{logit}(\textrm{E}[spiders_{ij}])&=\eta_{i5}(t_{ij}) & \text{(L5)}\ &= (\beta_{50} +b_{i50})+\beta_{51}X_{i}+(\beta_{52}+b_{i51})t+\beta_{53}X_{i}t \ \lambda_{i1}(t)&=\lambda_{01}(t)\ \textrm{exp}\left(\gamma_1 X_i + \varphi_1\eta_{i1}(t) + \varphi_3\eta_{i1}'(t) + \varphi_4\eta_{i2}(t) \right. & \text{(S1)}\ & \ \ \ + \left.\varphi_5\eta_{i3}(t) + \varphi_7\eta_{i4}(t) + \varphi_9\eta_{i5}(t) \right) \ \lambda_{i2}(t)&=\lambda_{02}(t)\ \textrm{exp}\left(\gamma_2 X_i + \varphi_2\eta_{i1}(t) + \varphi_6\eta_{i3}(t) + \varphi_8\eta_{i4}(t) \right) & \text{(S2)} \end{array} \right.$$
where $\textrm{NS}_1(t), \textrm{NS}_2(t), \textrm{NS}_3(t)$ are the natural cubic splines with internal knots at 1 and 4 years. We assume independent random effects between longitudinal markers. For computational convenience, here we fix the variance and covariance of all the random effects, so the model can run in a reasonable computation time on any computer. This is done by the use of the control argument 'initVC' to set initial values for variance and covariance terms (which can also be replaced by 'initSD' to set initial values for standard deviations and correlations instead) and the control argument 'fixRE' which is a boolean that allows to fix the values of random effects variances and covariances (or standard deviations and correlations). Noe that removing 'initVC' and 'initRE' from control arguments will estimate the full model.
# set up natural cubic splines for longitudinal markers's trajectories Nsplines <- ns(Longi$year, knots=c(1,4)) f1 <- function(x) predict(Nsplines, x)[,1] f2 <- function(x) predict(Nsplines, x)[,2] f3 <- function(x) predict(Nsplines, x)[,3] initVC <- list(c(1.04, 1.58, 2.43, 1.80, 0.32, 0.63, 0.63, 1.76, 0.89, 1.42), c(0.17, 0.21, 0.31, 0.22, -0.02, 0.02, 0.01, 0.17, 0.09, 0.15), c(0.13, 0.01, 0), c(0.15, 1.31, 4.99, 16.32, -0.04, -0.08, -0.06, -1.83, -3.70, 8.62), c(5.89, 0.14, -0.25)) fixRE <- list(T,T,T,T,T) M16 <-joint(formSurv = list(inla.surv(years, death) ~ drug, inla.surv(years, trans) ~ drug), formLong = list(serBilir ~ (1 + f1(year) + f2(year) + f3(year)) * drug + (1 + f1(year) + f2(year) + f3(year) | id), SGOT ~ (1 + f1(year) + f2(year) + f3(year)) * drug + (1 + f1(year) + f2(year) + f3(year) | id), albumin ~ (1 + year) * drug + (1 + year | id), platelets ~ (1 + f1(year) + f2(year) + f3(year)) * drug + (1 + f1(year) + f2(year) + f3(year) | id), spiders ~ (1 + year) * drug + (1 + year | id)), dataLong = Longi, dataSurv=Surv, id = "id", timeVar = "year", family = c("lognormal", "lognormal", "gaussian", "poisson", "binomial"), basRisk = c("rw2", "rw1"), NbasRisk = 15, assoc = list(c("CV_CS", "CV"), c("CV", ""), c("CV", "CV"), c("CV", "CV"), c("CV", "")), control=list(int.strategy="eb", initVC=initVC, fixRE=fixRE)) summary(M16)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.