joint | R Documentation |
Fit a joint model to time-to-event and multivariate longitudinal data
joint(
long.formulas,
surv.formula,
data,
family,
disp.formulas = NULL,
control = list()
)
long.formulas |
A list of formula objects specifying the |
surv.formula |
A formula specifying the time-to-event sub-model. Must be usable by
|
data |
A |
family |
A list of length |
disp.formulas |
An optional list of length |
control |
A list of control values:
|
Function joint
fits a joint model to time-to-event data and multivariate
longitudinal data. The longitudinal data can be specified by numerous models encompassing
a fairly wide range of data. This joint model fit is achieved by the use of an approximate
EM algorithm first proposed in Bernhardt et al. (2015), and later used in the 'classic'
multivariate joint model in Murray and Philipson (2022). Each longitudinal response is
modelled by
h_k(E[Y_{ik}|b_{ik};\Omega]) = X_{ik}\beta_k + Z_{ik}b_{ik}
where h_k
is a known, monotonic link function. An association is induced between the
K
th response and the hazard \lambda_i(t)
by:
\lambda_i(t)=\lambda_0(t)\exp\{S_i^T\zeta + \sum_{k=1}^K\gamma_kW_k(t)^Tb_{ik}\}
where \gamma_k
is the association parameter and W_k(t)
is the vector function of
time imposed on the K
th random effects structure (i.e. intercept-and-slope; spline).
An object with class joint
. See joint.object
for information.
Currently, five families are available for implementation, spanning continuous, binary and count data types:
'gaussian'
Normally distributed. The identity link is used. A term
\sigma_k
will be estimated, denoting the variance of this response
'binomial'
For binary data types, a logit link is used.
'poisson'
For count data types where dispersion is either non-consequential or ignored. A log link is used.
'genpois'
For count data types where dispersion is at least of some
secondary interest. A log link is used. A term \sigma_k
is estimated, denoting
the dispersion, \varphi
of the response. This follows interpretation of Zamani &
Ismail (2012): \varphi>0
: Over-dispersion; \varphi<0
: Under-dispersion.
Var[Y]=(1+\varphi)^2\mu
.
'Gamma'
For continuous data where a Gamma distribution might be sensible.
The log link is used. A term \sigma_k
is be estimated, denoting the (log) shape of
the distribution, which is then reported as \varphi_k=\exp\{\sigma_k\}
.
"negbin"
For count data types where overdispersion is modelled. A log link
is used. A term \sigma_k
is estimated, which is then reported as
\varphi_k=\exp\{\sigma_k\}
which is the overdispersion. The variance of the response
is Var[Y]=\mu+\mu^2/\varphi
.
For families "negbin"
, "Gamma"
, "genpois"
, the user can define the
dispersion model desired in disp.formulas
. For the "negbin"
and "Gamma"
cases, we define \varphi_i=\exp\{W_i\sigma_i\}
(i.e. the exponent of the linear
predictor of the dispersion model; and for "genpois"
the identity of the linear
is used.
The disp.formulas
in the function call allows the user to model the dispersion for
a given sub-model if wanted. The default value disp.formulas = NULL
simply imposes
an 'intercept only' model. If the k
th item in disp.formulas
corresponds to
a longitudinal sub-model with no dispersion term, then it is simply ignored. With this in
mind then, if a dispersion model is only required for, say, one sub-model, then the
corresponding item in this list of models should be specified as such, with the others set to
~1
.
We follow the approximation of the observed empirical information matrix detailed by
Mclachlan and Krishnan (2008), and later used in joineRML
(Hickey et al., 2018).
These are only calculated if post.process=TRUE
. Generally, these SEs are well-behaved,
but their reliability will depend on multiple factors: Sample size; number of events;
collinearity of REs of responses; number of observed times, and so on. Some more discussion/
references are given in vcov.joint
.
A few convergence criteria (specified by control$conv
) are available:
abs
Convergence reached when maximum absolute change in parameter estimates
is <tol.abs
.
rel
Convergence reached when maximum absolute relative change in parameter
estimates is <tol.rel
. A small amount (tol.den
) is added to the denominator
to eschew numerical issues if parameters are nearly zero.
either
Convergence is reached when either abs
or rel
are met.
sas
Assess convergence for parameters |\Omega_a|
<tol.thr
by the
abs
criterion, else rel
. This is the default.
Note that the baseline hazard is updated at each EM iteration, but is not monitored for convergence.
James Murray (j.murray7@ncl.ac.uk).
Bernhardt PW, Zhang D and Wang HJ. A fast EM Algorithm for Fitting Joint Models of a Binary Response to Multiple Longitudinal Covariates Subject to Detection Limits. Computational Statistics and Data Analysis 2015; 85; 37–53
Hickey GL, Philipson P, Jorgensen A, Kolamunnage-Dona R. joineRML
: a joint model and
software package for time-to-event and multivariate longitudinal outcomes.
BMC Med. Res. Methodol. 2018; 50
McLachlan GJ, Krishnan T. The EM Algorithm and Extensions. Second Edition. Wiley-Interscience; 2008.
Murray, J and Philipson P. A fast approximate EM algorithm for joint models of survival and multivariate longitudinal data.Computational Statistics and Data Analysis 2022; 170; 107438
Zamani H and Ismail N. Functional Form for the Generalized Poisson Regression Model, Communications in Statistics - Theory and Methods 2012; 41(20); 3666-3675.
summary.joint
, logLik.joint
, boot.joint
,
extractAIC.joint
, fixef.joint
, ranef.joint
,
vcov.joint
, joint.object
and xtable.joint
. For
data simulation see simData
.
# 1) Fit on simulated bivariate data, (1x gaussian, 1x poisson) --------
beta <- do.call(rbind, replicate(2, c(2, -0.1, 0.1, -0.2), simplify = FALSE))
gamma <- c(0.3, -0.3)
D <- diag(c(0.25, 0.09, 0.25, 0.05))
family <- list('gaussian', 'poisson')
data <- simData(ntms = 10, beta = beta, D = D, n = 100,
family = family, zeta = c(0, -0.2),
sigma = list(0.16, 0), gamma = gamma)$data
# Specify formulae and target families
long.formulas <- list(
Y.1 ~ time + cont + bin + (1 + time|id), # Gaussian
Y.2 ~ time + cont + bin + (1 + time|id) # Poisson
)
surv.formula <- Surv(survtime, status) ~ bin
fit <- joint(long.formulas, surv.formula, data, family)
# 2) Fit on PBC data -----------------------------------------------------
data(PBC)
# Subset data and remove NAs
PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time',
'serBilir', 'albumin', 'spiders', 'platelets'))
PBC <- na.omit(PBC)
# Specify GLMM sub-models, including interaction and quadratic time terms
long.formulas <- list(
log(serBilir) ~ drug * (time + I(time^2)) + (1 + time + I(time^2)|id),
albumin ~ drug * time + (1 + time|id),
platelets ~ drug * time + (1 + time|id),
spiders ~ drug * time + (1|id)
)
surv.formula <- Surv(survtime, status) ~ drug
fit <- joint(long.formulas, surv.formula, PBC,
family = list("gaussian", "gaussian", "poisson", "binomial"),
control = list(verbose = TRUE))
fit
# 3) Fit with dispersion models ----------------------------------------
beta <- do.call(rbind, replicate(2, c(2, -0.1, 0.1, -0.2), simplify = FALSE))
gamma <- c(0.3, -0.3)
D <- diag(c(0.25, 0.09, 0.25, 0.05))
family <- list('negbin', 'poisson') # As an example; only requires one dispersion model.
sigma <- list(c(1, 0.2), 0) # Need to specify the model in simData call too.
disp.formulas = list(~time, ~1) # Even though poisson doesn't model dispersion, need to
# populate this element in disp.formulas!
# Simulate some data
data <- simData(ntms = 10, beta = beta, D = D, n = 500,
family = family, zeta = c(0, -0.2), sigma = sigma,
disp.formulas = disp.formulas, gamma = gamma)$data
# Now fit using joint
long.formulas <- list(
Y.1 ~ time + cont + bin + (1+time|id),
Y.2 ~ time + cont + bin + (1+time|id)
)
fit <- joint(
long.formulas, Surv(survtime, status) ~ bin,
data, family, disp.formulas = disp.formulas
)
fit
summary(fit)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.