simData | R Documentation |
Simulate multivariate longitudinal and survival data from a joint model
specification, with potential mixture of response families. Implementation is similar
to existing packages (e.g. joineR
, joineRML
).
simData(
n = 250,
ntms = 10,
fup = 5,
family = list("gaussian", "gaussian"),
sigma = list(0.16, 0.16),
beta = rbind(c(1, 0.1, 0.33, -0.5), c(1, 0.1, 0.33, -0.5)),
D = NULL,
gamma = c(0.5, -0.5),
zeta = c(0.05, -0.3),
theta = c(-4, 0.2),
cens.rate = exp(-3.5),
regular.times = TRUE,
dof = Inf,
random.formulas = NULL,
disp.formulas = NULL,
return.ranefs = FALSE
)
n |
the number of subjects |
ntms |
the number of time points |
fup |
the maximum follow-up time, such that t = [0, ..., fup] with length |
family |
a |
sigma |
a |
beta |
a |
D |
a positive-definite matrix specifying the variance-covariance matrix for the random effects. If not supplied an identity matrix is assumed. |
gamma |
a |
zeta |
a vector of length 2 specifying the coefficients for the baseline covariates in the survival sub-model, in the order of continuous and binary. |
theta |
parameters to control the failure rate, see baseline hazard. |
cens.rate |
parameter for |
regular.times |
logical, if |
dof |
integer, specifies the degrees of freedom of the multivariate t-distribution
used to generate the random effects. If specified, this t-distribution is used. If left
at the default |
random.formulas |
allows user to specify if an intercept-and-slope ( |
disp.formulas |
allows user to specify the dispersion model simulated. Intended use is
to allow swapping between intercept only (the default) and a time-varying one ( |
return.ranefs |
a logical determining whether the true random effects should be
returned. This is largely for internal/simulation use. Default |
simData
simulates data from a multivariate joint model with a mixture of
families for each k=1,\dots,K
response. The specification of family
changes
requisite dispersion parameter sigma
, if applicable. The family
list can
(currently) contain:
"gaussian"
Simulated with identity link, corresponding item in sigma
will be the variance.
"poisson"
Simulated with log link, corresponding dispersion in sigma
can be anything, as it doesn't impact simulation.
"binomial"
Simulated with logit link, corresponding dispersion in sigma
can be anything, as it doesn't impact simulation.
"negbin"
Simulated with a log link, corresponding item in sigma
will be
the overdispersion defined on the log scale. Simulated variance is
\mu+\mu^2/\varphi
.
"genpois"
Simulated with a log link, corresponding item in sigma
will be
the dispersion. Values < 0 correspond to under-dispersion, and values > 0 over-
dispersion. See rgenpois
for more information. Simulated variance is
(1+\varphi)^2\mu
.
"Gamma"
Simulated with a log link, corresponding item in sigma
will be
the shape parameter, defined on the log-scale.
Therefore, for families "negbin"
, "Gamma"
, "genpois"
, the user can
define the dispersion model desired in disp.formulas
, which creates a data matrix
W
. For the "negbin"
and "Gamma"
cases, we define
\varphi_i=\exp\{W_i\sigma_i\}
(i.e. the exponent of the linear predictor of the
dispersion model); and for "genpois"
the identity of the linear is used.
A list of two data.frame
s: One with the full longitudinal data, and another
with only survival data. If return.ranefs=TRUE
, a matrix of the true b
values is
also returned. By default (i.e. no arguments provided), a bivariate Gaussian set of joint
data is returned.
When simulating the survival time, the baseline hazard is a Gompertz distribution controlled
by theta=c(x,y)
:
\lambda_0(t) = \exp{x + yt}
where y
is the shape parameter, and the scale parameter is \exp{x}
.
James Murray (j.murray7@ncl.ac.uk).
Austin PC. Generating survival times to simulate Cox proportional hazards models with time-varying covariates. Stat Med. 2012; 31(29): 3946-3958.
joint
# 1) A set of univariate data ------------------------------------------
beta <- c(2.0, 0.33, -0.25, 0.15)
# Note that by default arguments are bivariate, so need to specify the univariate case
univ.data <- simData(beta = beta,
gamma = 0.15, sigma = list(0.2), family = list("gaussian"),
D = diag(c(0.25, 0.05)))
# 2) Univariate data, with failure rate controlled ---------------------
# In reality, many facets contribute to the simulated failure rate, in
# this example, we'll just atler the baseline hazard via 'theta'.
univ.data.highfail <- simData(beta = beta,
gamma = 0.15, sigma = list(0.0), family = list("poisson"),
D = diag(c(0.40, 0.08)), theta = c(-2, 0.1))
# 3) Trivariate (K = 3) mixture of families with dispersion parameters -
beta <- do.call(rbind, replicate(3, c(2, -0.1, 0.1, -0.2), simplify = FALSE))
gamma <- c(0.3, -0.3, 0.3)
D <- diag(c(0.25, 0.09, 0.25, 0.05, 0.25, 0.09))
family <- list('gaussian', 'genpois', 'negbin')
sigma <- list(.16, 1.5, log(1.5))
triv.data <- simData(ntms=15, family = family, sigma = sigma, beta = beta, D = D,
gamma = gamma, theta = c(-3, 0.2), zeta = c(0,-.2))
# 4) K = 4 mixture of families with/out dispersion ---------------------
beta <- do.call(rbind, replicate(4, c(2, -0.1, 0.1, -0.2), simplify = FALSE))
gamma <- c(-0.75, 0.3, -0.6, 0.5)
D <- diag(c(0.25, 0.09, 0.25, 0.05, 0.25, 0.09, 0.16, 0.02))
family <- list('gaussian', 'poisson', 'binomial', 'gaussian')
sigma <- list(.16, 0, 0, .05) # 0 can be anything here, as it is ignored internally.
mix.data <- simData(ntms=15, family = family, sigma = sigma, beta = beta, D = D, gamma = gamma,
theta = c(-3, 0.2), zeta = c(0,-.2))
# 5) Bivariate joint model with two dispersion models. -----------------
disp.formulas <- list(~time, ~time) # Two time-varying dispersion models
sigma <- list(c(0.00, -0.10), c(0.10, 0.15)) # specified in form of intercept, slope
D <- diag(c(.25, 0.04, 0.50, 0.10))
disp.data <- simData(family = list("genpois", "negbin"), sigma = sigma, D = D,
beta = rbind(c(0, 0.05, -0.15, 0.00), 1 + c(0, 0.25, 0.15, -0.20)),
gamma = c(1.5, 1.5),
disp.formulas = disp.formulas, fup = 5)
# 6) Trivariate joint model with mixture of random effects models ------
# It can be hard to e.g. fit a binomial model on an intercept and slope, since e.g.
# glmmTMB might struggle to accurately fit it (singular fits, etc.). To that end, could
# swap the corresponding random effects specification to be an intercept-only.
family <- list("gaussian", "binomial", "gaussian")
# A list of formulae, even though we want to change the second sub-model's specification
# we need to specify the rest of the items, too (same as disp.formulas, sigma).
random.formulas <- list(~time, ~1, ~time)
beta <- rbind(c(2, -0.2, 0.5, -0.25), c(0, 0.5, 1, -1), c(-2, 0.2, -0.5, 0.25))
# NOTE that the specification of RE matrix will need to match.
D <- diag(c(0.25, 0.09, 1, 0.33, 0.05))
# Simulate data, and return REs as a sanity check...
mix.REspec.data <- simData(beta = beta, D = D, family = family,
gamma = c(-0.5, 1, 0.5), sigma = list(0.15, 0, 0.15),
random.formulas = random.formulas, return.ranefs = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.