# R/simjoint.R In joineR: Joint Modelling of Repeated Measurements and Time-to-Event Data

#### Documented in simjoint

#' Simulate data from a joint model
#'
#' @description This function simulates longitudinal and time-to-event data from
#'   a joint model.
#'
#' @param n the number of subjects to simulate data for.
#' @inheritParams joint
#' @param ntms the maximum number of (discrete) time points to simulate repeated
#'   longitudinal measurements at.
#' @param b1 a vector specifying the coefficients of the fixed effects in the
#'   longitudinal sub-model. The order in each row is intercept, a continuous
#'   covariate, covariate, a binary covariate, the measurement time.
#' @param b2 a vector of \code{length = 2} specifying the coefficients for the
#'   time-to-event baseline covariates, in the order of a continuous covariate
#'   and a binary covariate.
#' @param gamma a vector of specifying the latent association parameter(s) for
#'   the longitudinal outcome. It must be of length 1 if \code{sepassoc =
#'   FALSE}.
#' @param sigu a positive-definite matrix specifying the variance-covariance
#'   matrix. If \code{model = "int"}, the matrix has dimension \code{dim = c(1,
#'   1)}; if \code{model = "intslope"}, the matrix has dimension \code{dim =
#'   c(2, 2)}; else if \code{model = "quad"}, the matrix has dimension \code{dim
#'   = c(3, 3)}. If \code{D = NULL} (default), an identity matrix is assumed.
#' @param vare a numeric value specifying the residual standard error.
#' @param theta0,theta1 parameters controlling the failure rate. See Details.
#' @param censoring logical: if \code{TRUE}, includes an independent censoring
#'   time.
#' @param censlam a scale (\eqn{>0}) parameter for an exponential distribution
#'   used to simulate random censoring times for when \code{censoring = TRUE}.
#' @param truncation logical: if \code{TRUE}, adds a truncation time for a
#'   maximum event time in the case of \code{model = "int"} or \code{model =
#'   "intslope"}.
#' @param trunctime a truncation time for use when \code{truncation = TRUE}. For
#'   \code{model = "quad"}, \code{trunctime} is required, and defaults to
#'   \code{max(ntms)} if not specified.
#' @param gridstep the step-size for the grid used to simulate event times when
#'   \code{model = "quad"}. Default is \code{gridstep = 0.01}. See Details.
#'
#' @details The function \code{simjoint} simulates data from a joint model,
#'   similar to that performed in Henderson et al. (2000). It works by first
#'   simulating longitudinal data for all possible follow-up times using random
#'   draws for the multivariate Gaussian random effects and residual error
#'   terms. Data can be simulated assuming either random-intercepts only
#'   (\code{model = "int"}) in each of the longitudinal sub-models;
#'   random-intercepts and random-slopes (\code{model = "intslope"}); or
#'   times are simulated from proportional hazards time-to-event models, using
#'   the following methodologies:
#'
#'   \describe{
#'
#'   \item{\code{model = "int"}}{The baseline hazard function is specified to be
#'   an exponential distribution with
#'
#'   \deqn{\lambda_0(t) = \exp{\theta_0}.}
#'
#'   Simulation is conditional on known time-independent effects, and the
#'   methodology of Bender et al. (2005) is used to simulate the failure time.}
#'
#'   \item{\code{model = "intslope"}}{The baseline hazard function is specified
#'   to be a Gompertz distribution with
#'
#'   \deqn{\lambda_0(t) = \exp{\theta_0 + \theta_1 t}.}
#'
#'   In the usual representation of the Gompertz distribution, \eqn{\theta_1} is
#'   the shape parameter, and the scale parameter is equivalent to
#'   \eqn{\exp(\theta_0)}. Simulation is conditional on on a predictable
#'   (linear) time-varying process, and the methodology of Austin (2012) is used
#'   to simulate the failure time.}
#'
#'   \item{\code{model="quad"}}{The baseline hazard function is specified as per
#'   \code{model="intslope"}. The integration technique used for the above two
#'   cases is complex in quadratic (and higher order) models, therefore we use a
#'   different approach. We note that hazard function can be written as
#'
#'   \deqn{\lim_{dt \rightarrow 0} \lambda(t) dt = \lim_{dt \rightarrow 0}
#'   P[t \le T \le t + dt | T \ge t].}
#'
#'   In the simulation routine the parameter \code{gridstep} acts as \eqn{dt} in
#'   that we choose a nominally small value, which multiplies the hazard and
#'   this scaled hazard is equivalent to the probability of having an event in
#'   the interval \eqn{(t, t + dt)}, or equivalently \eqn{(t, t +
#'   }\code{gridstep}\eqn{)}. A vector of possible times is set up for each
#'   individual, ranging from 0 to \code{trunctime} in increments of \eqn{dt} (or
#'   \code{gridstep}). The failure probability at each time is compared to an
#'   independent \eqn{U(0, 1)} draw, and if the probability does not exceed the
#'   random draw then the survival time is set as \code{trunctime}, otherwise it
#'   is the generated time from the vector of candidate times. The minimum of
#'   these candidate times (i.e. when we deem the event to have first happened)
#'   is taken as the survival time.}
#'
#'   }
#'
#' @author Pete Philipson
#' @keywords datagen survival
#'
#' @references
#'
#' Austin PC. Generating survival times to simulate Cox proportional hazards
#' models with time-varying covariates. \emph{Stat Med.} 2012; \strong{31(29)}:
#' 3946-3958.
#'
#' Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox
#' proportional hazards models. \emph{Stat Med.} 2005; \strong{24}: 1713-1723.
#'
#' Henderson R, Diggle PJ, Dobson A. Joint modelling of longitudinal
#' measurements and event time data. \emph{Biostatistics.} 2000; \strong{1(4)}:
#' 465-480.
#'
#' @return A list of 2 \code{data.frame}s: one recording the requisite
#'   longitudinal outcomes data, and one recording the time-to-event data.
#' @export
#'
#' @examples
#' simjoint(10, sepassoc = TRUE)
simjoint <- function(n = 500, model = c("intslope", "int", "quad"), sepassoc = FALSE,
ntms = 5, b1 = c(1, 1, 1, 1), b2 = c(1, 1), gamma = c(1, 0.1),
sigu, vare = 0.01, theta0 = -3, theta1 = 1, censoring = TRUE,
censlam = exp(-3), truncation = FALSE, trunctime = max(ntms),
gridstep = 0.01) {

model <- match.arg(model)
if (model != "intslope" && model != "int" && model != "quad") {
stop(paste("Unknown model:", model))
}
ran <- 2 # default: model = "intslope"
if (model == "int") {
ran <- 1
} else if (model == "quad") {
ran <- 3
}
lat <- ran
if (!sepassoc) {
lat <- 1
}
if (length(gamma) != lat) {
warning("Number of association parameters do not match model choice\n")
}
gamma <- rep(gamma, length = ran)
if (missing(sigu)) {
sigu <- diag(ran)
}
if (length(sigu) != ran^2) {
warning("Dimension of covariance matrix does not match chosen model\n")
if(length(sigu) > ran^2) {
sigu <- sigu[1:ran, 1:ran]
} else {
sigu <- diag(ran) * sigu[1]
}
}
if (model == "int") {
if(sigu < 0) {
stop("Variance must be positive")
}
} else {
if (!isSymmetric(sigu)) {
stop("Covariance matrix is not symmetric")
}
if (any(eigen(sigu)$values < 0) || (det(sigu) <= 0)) { stop("Covariance matrix must be positive semi-definite") } } sim <- simdat(n, model, sepassoc, ntms, ran, b1, b2, gamma, sigu, vare, theta0, theta1, censoring, censlam, truncation, trunctime, gridstep) list(longitudinal = sim$longdat, survival = sim\$survdat)

}


## Try the joineR package in your browser

Any scripts or data that you put into this service are public.

joineR documentation built on Jan. 23, 2023, 5:39 p.m.