joint: Fit a joint model to time-to-event and multivariate...

View source: R/joint.R

jointR Documentation

Fit a joint model to time-to-event and multivariate longitudinal data

Description

Fit a joint model to time-to-event and multivariate longitudinal data

Usage

joint(
  long.formulas,
  surv.formula,
  data,
  family,
  disp.formulas = NULL,
  control = list()
)

Arguments

long.formulas

A list of formula objects specifying the K responses. Each must be usable by glmmTMB. A restriction is that unique identifiers must be named id, and increment in intervals of at exactly one. The variable for time must be named time.

surv.formula

A formula specifying the time-to-event sub-model. Must be usable by coxph.

data

A data.frame containing all covariates and responses.

family

A list of length K containing strings denoting the exponential families for each longitudinal sub-model, corresponding in order to long.formulas. For choices of family, see details.

disp.formulas

An optional list of length K specifying the dispersion models wanted for each longitudinal sub-model, corresponding in order to long.formulas. Defaults to disp.formulas = NULL. See details for more information.

control

A list of control values:

verbose

Logical: If TRUE, at each iteration parameter information will be printed to console. Default is verbose=FALSE.

conv

Character: Convergence criterion, see details.

tol.abs

Numeric: Tolerance value used to assess convergence, see details. Default is tol.abs=1e-3.

tol.rel

Numeric: Tolerance value used to assess convergence, see details. Default is tol.rel=1e-2.

tol.den

Numeric: Tolerance value used to assess convergence, see details. Default is tol.den=1e-3.

tol.thr

Numeric: Threshold used when conv = 'sas', see details. Default is tol.thr=1e-1.

grad.eps

Numeric: Step size for numerical differentiation routines used to calculate the gradient in updates to dispersion parameters. This defaults to the cube root of machine tolerance. If a different step size is wanted for each response, a list can also be provided, with each of its elements corresponding to each longitudinal response (even if not fitted with a dispersion model).

hess.eps

Numeric: Step size for numerical differentiation routines used to calculate the hessian in updates to dispersion parameters. This defaults to the fourth root of machine tolerance. Behaves in same way as grad.eps for more information.

inits

List: list of initial conditions, any/all of the following can be specified (largely for bootstrapping purposes). Accepts elements named: D, which should be an appropriately-dimensioned variance-covariance matrix; beta, a vector containing all fixed effects; sigma a list containing all dispersion parameters, with non-applicable elements set to zero; gamma a vector containing all association parameters; zeta a vector containing the time-invariant survival coefficients.

maxit

Integer: Maximum number of EM iterations to carry out before exiting the algorithm. Defaults to maxit=200L, which is usually sufficient.

correlated

Logical: Should covariance parameters between responses be estimated and used in determination of model convergence? Default is correlated=TRUE. A choice of correlated=FALSE is equivalent to imposing the belief that deviations in longitudinal trajectories are not correlated across responses, but can decrease computation time, particularly for large K.

gh.nodes

Integer: Number of weights and abscissae to use in gauss-hermite quadrature. Defaults to gh.nodes=3, which is usually sufficient.

gh.sigma

Numeric: Standard deviation for gauss-hermite approximation of normal distribution. Defaults to gh.sigma=1. This should rarely (if ever) need altering.

return.dmats

Logical: Should data matrices be returned? Defaults to return.dmats=TRUE. Note that some S3 methods for joint.objects require the returned object to include these data matrices.

return.inits

Logical: Should a list of inital conditons be returned? Defaults to return.inits=FALSE.

center.ph

Logical: Should the survival covariates be mean-centered? Defaults to center.ph=TRUE.

post.process

Logical: Should model post-processing be carried out (assumes that the model has converged). Defaults to post.process = TRUE which then returns posterior modes and their variance for the random effects, as well as approximated standard error. This is largely for internal use (i.e. if bootstrapping to obtain SEs instead).

Details

Function joint fits a joint model to time-to-event data and multivariate longitudinal data. The longitudinal data can be specified by numerous models encompassing a fairly wide range of data. This joint model fit is achieved by the use of an approximate EM algorithm first proposed in Bernhardt et al. (2015), and later used in the 'classic' multivariate joint model in Murray and Philipson (2022). Each longitudinal response is modelled by

h_k(E[Y_{ik}|b_{ik};\Omega]) = X_{ik}\beta_k + Z_{ik}b_{ik}

where h_k is a known, monotonic link function. An association is induced between the Kth response and the hazard \lambda_i(t) by:

\lambda_i(t)=\lambda_0(t)\exp\{S_i^T\zeta + \sum_{k=1}^K\gamma_kW_k(t)^Tb_{ik}\}

where \gamma_k is the association parameter and W_k(t) is the vector function of time imposed on the Kth random effects structure (i.e. intercept-and-slope; spline).

Value

An object with class joint. See joint.object for information.

Family specification

Currently, five families are available for implementation, spanning continuous, binary and count data types:

'gaussian'

Normally distributed. The identity link is used. A term \sigma_k will be estimated, denoting the variance of this response

'binomial'

For binary data types, a logit link is used.

'poisson'

For count data types where dispersion is either non-consequential or ignored. A log link is used.

'genpois'

For count data types where dispersion is at least of some secondary interest. A log link is used. A term \sigma_k is estimated, denoting the dispersion, \varphi of the response. This follows interpretation of Zamani & Ismail (2012): \varphi>0: Over-dispersion; \varphi<0: Under-dispersion. Var[Y]=(1+\varphi)^2\mu.

'Gamma'

For continuous data where a Gamma distribution might be sensible. The log link is used. A term \sigma_k is be estimated, denoting the (log) shape of the distribution, which is then reported as \varphi_k=\exp\{\sigma_k\}.

"negbin"

For count data types where overdispersion is modelled. A log link is used. A term \sigma_k is estimated, which is then reported as \varphi_k=\exp\{\sigma_k\} which is the overdispersion. The variance of the response is Var[Y]=\mu+\mu^2/\varphi.

For families "negbin", "Gamma", "genpois", the user can define the dispersion model desired in disp.formulas. For the "negbin" and "Gamma" cases, we define \varphi_i=\exp\{W_i\sigma_i\} (i.e. the exponent of the linear predictor of the dispersion model; and for "genpois" the identity of the linear is used.

Dispersion models

The disp.formulas in the function call allows the user to model the dispersion for a given sub-model if wanted. The default value disp.formulas = NULL simply imposes an 'intercept only' model. If the kth item in disp.formulas corresponds to a longitudinal sub-model with no dispersion term, then it is simply ignored. With this in mind then, if a dispersion model is only required for, say, one sub-model, then the corresponding item in this list of models should be specified as such, with the others set to ~1.

Standard error estimation

We follow the approximation of the observed empirical information matrix detailed by Mclachlan and Krishnan (2008), and later used in joineRML (Hickey et al., 2018). These are only calculated if post.process=TRUE. Generally, these SEs are well-behaved, but their reliability will depend on multiple factors: Sample size; number of events; collinearity of REs of responses; number of observed times, and so on. Some more discussion/ references are given in vcov.joint.

Convergence of the algorithm

A few convergence criteria (specified by control$conv) are available:

abs

Convergence reached when maximum absolute change in parameter estimates is <tol.abs.

rel

Convergence reached when maximum absolute relative change in parameter estimates is <tol.rel. A small amount (tol.den) is added to the denominator to eschew numerical issues if parameters are nearly zero.

either

Convergence is reached when either abs or rel are met.

sas

Assess convergence for parameters |\Omega_a|<tol.thr by the abs criterion, else rel. This is the default.

Note that the baseline hazard is updated at each EM iteration, but is not monitored for convergence.

Author(s)

James Murray (j.murray7@ncl.ac.uk).

References

Bernhardt PW, Zhang D and Wang HJ. A fast EM Algorithm for Fitting Joint Models of a Binary Response to Multiple Longitudinal Covariates Subject to Detection Limits. Computational Statistics and Data Analysis 2015; 85; 37–53

Hickey GL, Philipson P, Jorgensen A, Kolamunnage-Dona R. joineRML: a joint model and software package for time-to-event and multivariate longitudinal outcomes. BMC Med. Res. Methodol. 2018; 50

McLachlan GJ, Krishnan T. The EM Algorithm and Extensions. Second Edition. Wiley-Interscience; 2008.

Murray, J and Philipson P. A fast approximate EM algorithm for joint models of survival and multivariate longitudinal data.Computational Statistics and Data Analysis 2022; 170; 107438

Zamani H and Ismail N. Functional Form for the Generalized Poisson Regression Model, Communications in Statistics - Theory and Methods 2012; 41(20); 3666-3675.

See Also

summary.joint, logLik.joint, boot.joint, extractAIC.joint, fixef.joint, ranef.joint, vcov.joint, joint.object and xtable.joint. For data simulation see simData.

Examples


# 1) Fit on simulated bivariate data, (1x gaussian, 1x poisson) --------
beta <- do.call(rbind, replicate(2, c(2, -0.1, 0.1, -0.2), simplify = FALSE))
gamma <- c(0.3, -0.3)
D <- diag(c(0.25, 0.09, 0.25, 0.05))
family <- list('gaussian', 'poisson')
data <- simData(ntms = 10, beta = beta, D = D, n = 100,
                family = family, zeta = c(0, -0.2),
                sigma = list(0.16, 0), gamma = gamma)$data

# Specify formulae and target families
long.formulas <- list(
  Y.1 ~ time + cont + bin + (1 + time|id),  # Gaussian
  Y.2 ~ time + cont + bin + (1 + time|id)   # Poisson
)
surv.formula <- Surv(survtime, status) ~ bin

fit <- joint(long.formulas, surv.formula, data, family)


# 2) Fit on PBC data -----------------------------------------------------
data(PBC)
# Subset data and remove NAs
PBC <- subset(PBC, select = c('id', 'survtime', 'status', 'drug', 'time',
                              'serBilir', 'albumin', 'spiders', 'platelets'))
PBC <- na.omit(PBC) 

# Specify GLMM sub-models, including interaction and quadratic time terms
long.formulas <- list(
  log(serBilir) ~ drug * (time + I(time^2)) + (1 + time + I(time^2)|id),
  albumin ~ drug * time + (1 + time|id),
  platelets ~ drug * time + (1 + time|id),
  spiders ~ drug * time + (1|id)
)
surv.formula <- Surv(survtime, status) ~ drug

fit <-  joint(long.formulas, surv.formula, PBC, 
              family = list("gaussian", "gaussian", "poisson", "binomial"),
              control = list(verbose = TRUE))
fit


# 3) Fit with dispersion models ----------------------------------------
beta <- do.call(rbind, replicate(2, c(2, -0.1, 0.1, -0.2), simplify = FALSE))
gamma <- c(0.3, -0.3)
D <- diag(c(0.25, 0.09, 0.25, 0.05))
family <- list('negbin', 'poisson')   # As an example; only requires one dispersion model.
sigma <- list(c(1, 0.2), 0)           # Need to specify the model in simData call too.
disp.formulas = list(~time, ~1)       # Even though poisson doesn't model dispersion, need to
                                      # populate this element in disp.formulas!
# Simulate some data
data <- simData(ntms = 10, beta = beta, D = D, n = 500,
                family = family, zeta = c(0, -0.2), sigma = sigma,
                disp.formulas = disp.formulas, gamma = gamma)$data

# Now fit using joint
long.formulas <- list(
  Y.1 ~ time + cont + bin + (1+time|id),
  Y.2 ~ time + cont + bin + (1+time|id)
)
fit <- joint(
  long.formulas, Surv(survtime, status) ~ bin,
  data, family, disp.formulas = disp.formulas
)
fit
summary(fit)


gmvjoint documentation built on Oct. 6, 2024, 1:07 a.m.