simData | R Documentation |

Simulate multivariate longitudinal and survival data from a joint model
specification, with potential mixture of response families. Implementation is similar
to existing packages (e.g. `joineR`

, `joineRML`

).

```
simData(
n = 250,
ntms = 10,
fup = 5,
family = list("gaussian", "gaussian"),
sigma = list(0.16, 0.16),
beta = rbind(c(1, 0.1, 0.33, -0.5), c(1, 0.1, 0.33, -0.5)),
D = NULL,
gamma = c(0.5, -0.5),
zeta = c(0.05, -0.3),
theta = c(-4, 0.2),
cens.rate = exp(-3.5),
regular.times = TRUE,
dof = Inf,
random.formulas = NULL,
disp.formulas = NULL,
return.ranefs = FALSE
)
```

`n` |
the number of subjects |

`ntms` |
the number of time points |

`fup` |
the maximum follow-up time, such that t = [0, ..., fup] with length |

`family` |
a |

`sigma` |
a |

`beta` |
a |

`D` |
a positive-definite matrix specifying the variance-covariance matrix for the random effects. If not supplied an identity matrix is assumed. |

`gamma` |
a |

`zeta` |
a vector of length 2 specifying the coefficients for the baseline covariates in the survival sub-model, in the order of continuous and binary. |

`theta` |
parameters to control the failure rate, see |

`cens.rate` |
parameter for |

`regular.times` |
logical, if |

`dof` |
integer, specifies the degrees of freedom of the multivariate t-distribution
used to generate the random effects. If specified, this t-distribution is used. If left
at the default |

`random.formulas` |
allows user to specify if an intercept-and-slope ( |

`disp.formulas` |
allows user to specify the dispersion model simulated. Intended use is
to allow swapping between intercept only (the default) and a time-varying one ( |

`return.ranefs` |
a logical determining whether the |

`simData`

simulates data from a multivariate joint model with a mixture of
families for each `k=1,\dots,K`

response. The specification of `family`

changes
requisite dispersion parameter `sigma`

, if applicable. The `family`

list can
(currently) contain:

`"gaussian"`

Simulated with identity link, corresponding item in

`sigma`

will be the**variance**.`"poisson"`

Simulated with log link, corresponding dispersion in

`sigma`

can be anything, as it doesn't impact simulation.`"binomial"`

Simulated with logit link, corresponding dispersion in

`sigma`

can be anything, as it doesn't impact simulation.`"negbin"`

Simulated with a log link, corresponding item in

`sigma`

will be the**overdispersion**defined on the log scale. Simulated variance is`\mu+\mu^2/\varphi`

.`"genpois"`

Simulated with a log link, corresponding item in

`sigma`

will be the**dispersion**. Values < 0 correspond to under-dispersion, and values > 0 over- dispersion. See`rgenpois`

for more information. Simulated variance is`(1+\varphi)^2\mu`

.`"Gamma"`

Simulated with a log link, corresponding item in

`sigma`

will be the**shape**parameter, defined on the log-scale.

Therefore, for families `"negbin"`

, `"Gamma"`

, `"genpois"`

, the user can
define the dispersion model desired in `disp.formulas`

, which creates a data matrix
`W`

. For the `"negbin"`

and `"Gamma"`

cases, we define
`\varphi_i=\exp\{W_i\sigma_i\}`

(i.e. the exponent of the linear predictor of the
dispersion model); and for `"genpois"`

the identity of the linear is used.

A list of two `data.frame`

s: One with the full longitudinal data, and another
with only survival data. If `return.ranefs=TRUE`

, a matrix of the true `b`

values is
also returned. By default (i.e. no arguments provided), a bivariate Gaussian set of joint
data is returned.

When simulating the survival time, the baseline hazard is a Gompertz distribution controlled
by `theta=c(x,y)`

:

`\lambda_0(t) = \exp{x + yt}`

where `y`

is the shape parameter, and the scale parameter is `\exp{x}`

.

James Murray (j.murray7@ncl.ac.uk).

Austin PC. Generating survival times to simulate Cox proportional hazards
models with time-varying covariates. *Stat Med.* 2012; **31(29)**:
3946-3958.

`joint`

```
# 1) A set of univariate data ------------------------------------------
beta <- c(2.0, 0.33, -0.25, 0.15)
# Note that by default arguments are bivariate, so need to specify the univariate case
univ.data <- simData(beta = beta,
gamma = 0.15, sigma = list(0.2), family = list("gaussian"),
D = diag(c(0.25, 0.05)))
# 2) Univariate data, with failure rate controlled ---------------------
# In reality, many facets contribute to the simulated failure rate, in
# this example, we'll just atler the baseline hazard via 'theta'.
univ.data.highfail <- simData(beta = beta,
gamma = 0.15, sigma = list(0.0), family = list("poisson"),
D = diag(c(0.40, 0.08)), theta = c(-2, 0.1))
# 3) Trivariate (K = 3) mixture of families with dispersion parameters -
beta <- do.call(rbind, replicate(3, c(2, -0.1, 0.1, -0.2), simplify = FALSE))
gamma <- c(0.3, -0.3, 0.3)
D <- diag(c(0.25, 0.09, 0.25, 0.05, 0.25, 0.09))
family <- list('gaussian', 'genpois', 'negbin')
sigma <- list(.16, 1.5, log(1.5))
triv.data <- simData(ntms=15, family = family, sigma = sigma, beta = beta, D = D,
gamma = gamma, theta = c(-3, 0.2), zeta = c(0,-.2))
# 4) K = 4 mixture of families with/out dispersion ---------------------
beta <- do.call(rbind, replicate(4, c(2, -0.1, 0.1, -0.2), simplify = FALSE))
gamma <- c(-0.75, 0.3, -0.6, 0.5)
D <- diag(c(0.25, 0.09, 0.25, 0.05, 0.25, 0.09, 0.16, 0.02))
family <- list('gaussian', 'poisson', 'binomial', 'gaussian')
sigma <- list(.16, 0, 0, .05) # 0 can be anything here, as it is ignored internally.
mix.data <- simData(ntms=15, family = family, sigma = sigma, beta = beta, D = D, gamma = gamma,
theta = c(-3, 0.2), zeta = c(0,-.2))
# 5) Bivariate joint model with two dispersion models. -----------------
disp.formulas <- list(~time, ~time) # Two time-varying dispersion models
sigma <- list(c(0.00, -0.10), c(0.10, 0.15)) # specified in form of intercept, slope
D <- diag(c(.25, 0.04, 0.50, 0.10))
disp.data <- simData(family = list("genpois", "negbin"), sigma = sigma, D = D,
beta = rbind(c(0, 0.05, -0.15, 0.00), 1 + c(0, 0.25, 0.15, -0.20)),
gamma = c(1.5, 1.5),
disp.formulas = disp.formulas, fup = 5)
# 6) Trivariate joint model with mixture of random effects models ------
# It can be hard to e.g. fit a binomial model on an intercept and slope, since e.g.
# glmmTMB might struggle to accurately fit it (singular fits, etc.). To that end, could
# swap the corresponding random effects specification to be an intercept-only.
family <- list("gaussian", "binomial", "gaussian")
# A list of formulae, even though we want to change the second sub-model's specification
# we need to specify the rest of the items, too (same as disp.formulas, sigma).
random.formulas <- list(~time, ~1, ~time)
beta <- rbind(c(2, -0.2, 0.5, -0.25), c(0, 0.5, 1, -1), c(-2, 0.2, -0.5, 0.25))
# NOTE that the specification of RE matrix will need to match.
D <- diag(c(0.25, 0.09, 1, 0.33, 0.05))
# Simulate data, and return REs as a sanity check...
mix.REspec.data <- simData(beta = beta, D = D, family = family,
gamma = c(-0.5, 1, 0.5), sigma = list(0.15, 0, 0.15),
random.formulas = random.formulas, return.ranefs = TRUE)
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.