stan_jm | R Documentation |
Fits a shared parameter joint model for longitudinal and time-to-event (e.g. survival) data under a Bayesian framework using Stan.
stan_jm(
formulaLong,
dataLong,
formulaEvent,
dataEvent,
time_var,
id_var,
family = gaussian,
assoc = "etavalue",
lag_assoc = 0,
grp_assoc,
scale_assoc = NULL,
epsilon = 1e-05,
basehaz = c("bs", "weibull", "piecewise"),
basehaz_ops,
qnodes = 15,
init = "prefit",
weights,
priorLong = normal(autoscale = TRUE),
priorLong_intercept = normal(autoscale = TRUE),
priorLong_aux = cauchy(0, 5, autoscale = TRUE),
priorEvent = normal(autoscale = TRUE),
priorEvent_intercept = normal(autoscale = TRUE),
priorEvent_aux = cauchy(autoscale = TRUE),
priorEvent_assoc = normal(autoscale = TRUE),
prior_covariance = lkj(autoscale = TRUE),
prior_PD = FALSE,
algorithm = c("sampling", "meanfield", "fullrank"),
adapt_delta = NULL,
max_treedepth = 10L,
QR = FALSE,
sparse = FALSE,
...
)
formulaLong |
A two-sided linear formula object describing both the
fixed-effects and random-effects parts of the longitudinal submodel,
similar in vein to formula specification in the lme4 package
(see | |||||||||
dataLong |
A data frame containing the variables specified in
| |||||||||
formulaEvent |
A two-sided formula object describing the event
submodel. The left hand side of the formula should be a | |||||||||
dataEvent |
A data frame containing the variables specified in
| |||||||||
time_var |
A character string specifying the name of the variable
in | |||||||||
id_var |
A character string specifying the name of the variable in
| |||||||||
family |
The family (and possibly also the link function) for the
longitudinal submodel(s). See | |||||||||
assoc |
A character string or character vector specifying the joint
model association structure. Possible association structures that can
be used include: "etavalue" (the default); "etaslope"; "etaauc";
"muvalue"; "muslope"; "muauc"; "shared_b"; "shared_coef"; or "null".
These are described in the Details section below. For a multivariate
joint model, different association structures can optionally be used for
each longitudinal submodel by specifying a list of character
vectors, with each element of the list specifying the desired association
structure for one of the longitudinal submodels. Specifying | |||||||||
lag_assoc |
A non-negative scalar specifying the time lag that should be
used for the association structure. That is, the hazard of the event at
time t will be assumed to be associated with the value/slope/auc of
the longitudinal marker at time t-u, where u is the time lag.
If fitting a multivariate joint model, then a different time lag can be used
for each longitudinal marker by providing a numeric vector of lags, otherwise
if a scalar is provided then the specified time lag will be used for all
longitudinal markers. Note however that only one time lag can be specified
for linking each longitudinal marker to the
event, and that that time lag will be used for all association structure
types (e.g. | |||||||||
grp_assoc |
Character string specifying the method for combining information
across lower level units clustered within an individual when forming the
association structure. This is only relevant when a grouping factor is
specified in | |||||||||
scale_assoc |
A non-zero numeric value specifying an optional scaling
parameter for the association structure. This multiplicatively scales the
value/slope/auc of the longitudinal marker by | |||||||||
epsilon |
The half-width of the central difference used to numerically
calculate the derivate when the | |||||||||
basehaz |
A character string indicating which baseline hazard to use
for the event submodel. Options are a B-splines approximation estimated
for the log baseline hazard ( | |||||||||
basehaz_ops |
A named list specifying options related to the baseline
hazard. Currently this can include:
| |||||||||
qnodes |
The number of nodes to use for the Gauss-Kronrod quadrature that is used to evaluate the cumulative hazard in the likelihood function. Options are 15 (the default), 11 or 7. | |||||||||
init |
The method for generating the initial values for the MCMC.
The default is | |||||||||
weights |
Experimental and should be used with caution. The user can optionally supply a 2-column data frame containing a set of 'prior weights' to be used in the estimation process. The data frame should contain two columns: the first containing the IDs for each individual, and the second containing the corresponding weights. The data frame should only have one row for each individual; that is, weights should be constant within individuals. | |||||||||
priorLong , priorEvent , priorEvent_assoc |
The prior distributions for the regression coefficients in the longitudinal submodel(s), event submodel, and the association parameter(s). Can be a call to one of the various functions provided by rstanarm for specifying priors. The subset of these functions that can be used for the prior on the coefficients can be grouped into several "families":
See the priors help page for details on the families and
how to specify the arguments for all of the functions in the table above.
To omit a prior —i.e., to use a flat (improper) uniform prior—
Note: Unless | |||||||||
priorLong_intercept , priorEvent_intercept |
The prior distributions
for the intercepts in the longitudinal submodel(s) and event submodel.
Can be a call to Note: The prior distribution for the intercept is set so it applies to the value when all predictors are centered. Moreover, note that a prior is only placed on the intercept for the event submodel when a Weibull baseline hazard has been specified. For the B-splines and piecewise constant baseline hazards there is not intercept parameter that is given a prior distribution; an intercept parameter will be shown in the output for the fitted model, but this just corresponds to the necessary post-estimation adjustment in the linear predictor due to the centering of the predictiors in the event submodel. | |||||||||
priorLong_aux |
The prior distribution for the "auxiliary" parameters
in the longitudinal submodels (if applicable).
The "auxiliary" parameter refers to a different parameter
depending on the
If fitting a multivariate joint model, you have the option to specify a list of prior distributions, however the elements of the list that correspond to any longitudinal submodel which does not have an auxiliary parameter will be ignored. | |||||||||
priorEvent_aux |
The prior distribution for the "auxiliary" parameters
in the event submodel. The "auxiliary" parameters refers to different
parameters depending on the baseline hazard. For | |||||||||
prior_covariance |
Cannot be | |||||||||
prior_PD |
A logical scalar (defaulting to | |||||||||
algorithm |
A string (possibly abbreviated) indicating the
estimation approach to use. Can be | |||||||||
adapt_delta |
Only relevant if | |||||||||
max_treedepth |
A positive integer specifying the maximum treedepth
for the non-U-turn sampler. See the | |||||||||
QR |
A logical scalar defaulting to | |||||||||
sparse |
A logical scalar (defaulting to | |||||||||
... |
Further arguments passed to the function in the rstan
package ( Another useful argument that can be passed to rstan via |
The stan_jm
function can be used to fit a joint model (also
known as a shared parameter model) for longitudinal and time-to-event data
under a Bayesian framework. The underlying
estimation is carried out using the Bayesian C++ package Stan
(https://mc-stan.org/).
The joint model may be univariate (with only one longitudinal submodel) or
multivariate (with more than one longitudinal submodel).
For the longitudinal submodel a (possibly multivariate) generalised linear
mixed model is assumed with any of the family
choices
allowed by glmer
. If a multivariate joint model is specified
(by providing a list of formulas in the formulaLong
argument), then
the multivariate longitudinal submodel consists of a multivariate generalized
linear model (GLM) with group-specific terms that are assumed to be correlated
across the different GLM submodels. That is, within
a grouping factor (for example, patient ID) the group-specific terms are
assumed to be correlated across the different GLM submodels. It is
possible to specify a different outcome type (for example a different
family and/or link function) for each of the GLM submodels, by providing
a list of family
objects in the family
argument. Multi-level
clustered data are allowed, and that additional clustering can occur at a
level higher than the individual-level (e.g. patients clustered within
clinics), or at a level lower than the individual-level (e.g. tumor lesions
clustered within patients). If the clustering occurs at a level lower than
the individual, then the user needs to indicate how the lower level
clusters should be handled when forming the association structure between
the longitudinal and event submodels (see the grp_assoc
argument
described above).
For the event submodel a parametric
proportional hazards model is assumed. The baseline hazard can be estimated
using either a cubic B-splines approximation (basehaz = "bs"
, the
default), a Weibull distribution (basehaz = "weibull"
), or a
piecewise constant baseline hazard (basehaz = "piecewise"
).
If the B-spline or piecewise constant baseline hazards are used,
then the degrees of freedom or the internal knot locations can be
(optionally) specified. If
the degrees of freedom are specified (through the df
argument) then
the knot locations are automatically generated based on the
distribution of the observed event times (not including censoring times).
Otherwise internal knot locations can be specified
directly through the knots
argument. If neither df
or
knots
is specified, then the default is to set df
equal to 6.
It is not possible to specify both df
and knots
.
Time-varying covariates are allowed in both the
longitudinal and event submodels. These should be specified in the data
in the same way as they normally would when fitting a separate
longitudinal model using lmer
or a separate
time-to-event model using coxph
. These time-varying
covariates should be exogenous in nature, otherwise they would perhaps
be better specified as an additional outcome (i.e. by including them as an
additional longitudinal outcome in the joint model).
Bayesian estimation of the joint model is performed via MCMC. The Bayesian
model includes independent priors on the
regression coefficients for both the longitudinal and event submodels,
including the association parameter(s) (in much the same way as the
regression parameters in stan_glm
) and
priors on the terms of a decomposition of the covariance matrices of the
group-specific parameters.
See priors
for more information about the priors distributions
that are available.
Gauss-Kronrod quadrature is used to numerically evaluate the integral
over the cumulative hazard in the likelihood function for the event submodel.
The accuracy of the numerical approximation can be controlled using the
number of quadrature nodes, specified through the qnodes
argument. Using a higher number of quadrature nodes will result in a more
accurate approximation.
The association structure for the joint model can be based on any of the following parameterisations:
current value of the linear predictor in the
longitudinal submodel ("etavalue"
)
first derivative (slope) of the linear predictor in the
longitudinal submodel ("etaslope"
)
the area under the curve of the linear predictor in the
longitudinal submodel ("etaauc"
)
current expected value of the longitudinal submodel
("muvalue"
)
the area under the curve of the expected value from the
longitudinal submodel ("muauc"
)
shared individual-level random effects ("shared_b"
)
shared individual-level random effects which also incorporate
the corresponding fixed effect as well as any corresponding
random effects for clustering levels higher than the individual)
("shared_coef"
)
interactions between association terms and observed data/covariates
("etavalue_data"
, "etaslope_data"
, "muvalue_data"
,
"muslope_data"
). These are described further below.
interactions between association terms corresponding to different
longitudinal outcomes in a multivariate joint model
("etavalue_etavalue(#)"
, "etavalue_muvalue(#)"
,
"muvalue_etavalue(#)"
, "muvalue_muvalue(#)"
). These
are described further below.
no association structure (equivalent to fitting separate
longitudinal and event models) ("null"
or NULL
)
More than one association structure can be specified, however,
not all possible combinations are allowed.
Note that for the lagged association structures baseline values (time = 0)
are used for the instances
where the time lag results in a time prior to baseline. When using the
"etaauc"
or "muauc"
association structures, the area under
the curve is evaluated using Gauss-Kronrod quadrature with 15 quadrature
nodes. By default, "shared_b"
and "shared_coef"
contribute
all random effects to the association structure; however, a subset of the
random effects can be chosen by specifying their indices between parentheses
as a suffix, for example, "shared_b(1)"
or "shared_b(1:3)"
or
"shared_b(1,2,4)"
, and so on.
In addition, several association terms ("etavalue"
, "etaslope"
,
"muvalue"
, "muslope"
) can be interacted with observed
data/covariates. To do this, use the association term's main handle plus a
suffix of "_data"
then followed by the model matrix formula in
parentheses. For example if we had a variable in our dataset for gender
named sex
then we might want to obtain different estimates for the
association between the current slope of the marker and the risk of the
event for each gender. To do this we would specify
assoc = c("etaslope", "etaslope_data(~ sex)")
.
It is also possible, when fitting a multivariate joint model, to include
interaction terms between the association terms themselves (this only
applies for interacting "etavalue"
or "muvalue"
). For example,
if we had a joint model with two longitudinal markers, we could specify
assoc = list(c("etavalue", "etavalue_etavalue(2)"), "etavalue")
.
The first element of list says we want to use the value of the linear
predictor for the first marker, as well as it's interaction with the
value of the linear predictor for the second marker. The second element of
the list says we want to also include the expected value of the second marker
(i.e. as a "main effect"). Therefore, the linear predictor for the event
submodel would include the "main effects" for each marker as well as their
interaction.
There are additional examples in the Examples section below.
A stanjm object is returned.
stanreg-objects
, stanmvreg-methods
,
print.stanmvreg
, summary.stanmvreg
,
posterior_traj
, posterior_survfit
,
posterior_predict
, posterior_interval
,
pp_check
, ps_check
, stan_mvmer
.
if (.Platform$OS.type != "windows" || .Platform$r_arch !="i386") {
#####
# Univariate joint model, with association structure based on the
# current value of the linear predictor
f1 <- stan_jm(formulaLong = logBili ~ year + (1 | id),
dataLong = pbcLong,
formulaEvent = Surv(futimeYears, death) ~ sex + trt,
dataEvent = pbcSurv,
time_var = "year",
# this next line is only to keep the example small in size!
chains = 1, cores = 1, seed = 12345, iter = 1000)
print(f1)
summary(f1)
#####
# Univariate joint model, with association structure based on the
# current value and slope of the linear predictor
f2 <- stan_jm(formulaLong = logBili ~ year + (year | id),
dataLong = pbcLong,
formulaEvent = Surv(futimeYears, death) ~ sex + trt,
dataEvent = pbcSurv,
assoc = c("etavalue", "etaslope"),
time_var = "year",
chains = 1, cores = 1, seed = 12345, iter = 1000)
print(f2)
#####
# Univariate joint model, with association structure based on the
# lagged value of the linear predictor, where the lag is 2 time
# units (i.e. 2 years in this example)
f3 <- stan_jm(formulaLong = logBili ~ year + (1 | id),
dataLong = pbcLong,
formulaEvent = Surv(futimeYears, death) ~ sex + trt,
dataEvent = pbcSurv,
time_var = "year",
assoc = "etavalue", lag_assoc = 2,
chains = 1, cores = 1, seed = 12345, iter = 1000)
print(f3)
#####
# Univariate joint model, where the association structure includes
# interactions with observed data. Here we specify that we want to use
# an association structure based on the current value of the linear
# predictor from the longitudinal submodel (i.e. "etavalue"), but we
# also want to interact this with the treatment covariate (trt) from
# pbcLong data frame, so that we can estimate a different association
# parameter (i.e. estimated effect of log serum bilirubin on the log
# hazard of death) for each treatment group
f4 <- stan_jm(formulaLong = logBili ~ year + (1 | id),
dataLong = pbcLong,
formulaEvent = Surv(futimeYears, death) ~ sex + trt,
dataEvent = pbcSurv,
time_var = "year",
assoc = c("etavalue", "etavalue_data(~ trt)"),
chains = 1, cores = 1, seed = 12345, iter = 1000)
print(f4)
######
# Multivariate joint model, with association structure based
# on the current value and slope of the linear predictor in the
# first longitudinal submodel and the area under the marker
# trajectory for the second longitudinal submodel
mv1 <- stan_jm(
formulaLong = list(
logBili ~ year + (1 | id),
albumin ~ sex + year + (year | id)),
dataLong = pbcLong,
formulaEvent = Surv(futimeYears, death) ~ sex + trt,
dataEvent = pbcSurv,
assoc = list(c("etavalue", "etaslope"), "etaauc"),
time_var = "year",
chains = 1, cores = 1, seed = 12345, iter = 100)
print(mv1)
#####
# Multivariate joint model, where the association structure is formed by
# including the expected value of each longitudinal marker (logBili and
# albumin) in the linear predictor of the event submodel, as well as their
# interaction effect (i.e. the interaction between the two "etavalue" terms).
# Note that whether such an association structure based on a marker by
# marker interaction term makes sense will depend on the context of your
# application -- here we just show it for demostration purposes).
mv2 <- stan_jm(
formulaLong = list(
logBili ~ year + (1 | id),
albumin ~ sex + year + (year | id)),
dataLong = pbcLong,
formulaEvent = Surv(futimeYears, death) ~ sex + trt,
dataEvent = pbcSurv,
assoc = list(c("etavalue", "etavalue_etavalue(2)"), "etavalue"),
time_var = "year",
chains = 1, cores = 1, seed = 12345, iter = 100)
#####
# Multivariate joint model, with one bernoulli marker and one
# Gaussian marker. We will artificially create the bernoulli
# marker by dichotomising log serum bilirubin
pbcLong$ybern <- as.integer(pbcLong$logBili >= mean(pbcLong$logBili))
mv3 <- stan_jm(
formulaLong = list(
ybern ~ year + (1 | id),
albumin ~ sex + year + (year | id)),
dataLong = pbcLong,
formulaEvent = Surv(futimeYears, death) ~ sex + trt,
dataEvent = pbcSurv,
family = list(binomial, gaussian),
time_var = "year",
chains = 1, cores = 1, seed = 12345, iter = 1000)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.