simjm: Simulate data for a univariate or multivariate joint model
In sambrilleman/simjm: Simulate Joint Longitudinal and Survival Data

Description Usage Arguments Details Value Examples

Returns a list of data frames containing data simulated from a joint model for longitudinal and time-to-event data.

simjm(n = 200, M = 1, fixed_trajectory = "cubic",
  random_trajectory = "linear", assoc = "etavalue",
  basehaz = c("weibull"), betaLong_intercept = 10, betaLong_binary = -1,
  betaLong_continuous = 1, betaLong_linear = -0.25,
  betaLong_quadratic = 0.03, betaLong_cubic = -0.0015, betaLong_aux = 0.5,
  betaEvent_intercept = -7.5, betaEvent_binary = -0.5,
  betaEvent_continuous = 0.5, betaEvent_assoc = 0.5, betaEvent_aux = 1.2,
  b_sd = c(1.5, 0.07), b_rho = -0.2, prob_Z1 = 0.5, mean_Z2 = 0,
  sd_Z2 = 1, max_yobs = 10, max_fuptime = 20, balanced = FALSE,
  family = gaussian, clust_control = list(), return_eta = FALSE,
  seed = sample.int(.Machine$integer.max, 1), interval = c(1e-08, 200))

`n`	Number of individuals
`M`	Number of longitudinal markers
`fixed_trajectory`	The desired type of trajectory in the fixed effects part of the longitudinal model. Can be `"none"` (intercept only), `"linear"` (intercept and linear slope), `"quadratic"` (intercept, linear slope, and quadratic term), or `"cubic"` (intercept, linear slope, quadratic term, and cubic term). Can be a single character string, or a character vector of length M (if a different trajectory type is to be used for each longitudinal submodel). Note that in addition to these time effects, two baseline covariates (one binary and one continuous) are always included in the longitudinal submodel as well.
`random_trajectory`	The desired type of trajectory in the random effects part of the longitudinal model. Can be `"none"` (random intercept only), `"linear"` (random intercept and linear slope), `"quadratic"` (random intercept, linear slope, and quadratic term), or `"cubic"` (random intercept, linear slope, quadratic term, and cubic term). Can be a single character string, or a character vector of length M (if a different trajectory type is to be used for each longitudinal submodel). Note that the corresponding `fixed_trajectory` argument must be at least as complex as the random effect structure; for example, you cannot specify `fixed_trajectory = "linear"` and `random_trajectory = "cubic"`, because there would be no corresponding fixed effects parameters for the quadratic and cubic random terms in the model.
`assoc`	A character string, or a character vector of length M, specifying the desired type of association structure for linking each longitudinal outcome to the hazard of the event. Only one association structure type can be specified for each longitudinal outcome. The options are: `"null"`, `"etavalue"`, `"etaslope"`, `"etaauc"`, `"muvalue"`, `"shared_b(1)"`, `"shared_coef(1)"`, `"shared_b(2)"`, `"shared_coef(2)"`.
`basehaz`	The desired baseline hazard for the event submodel. Can be `"weibull"`.
`betaLong_intercept`	True intercept in the longitudinal submodel. Can be a scalar or a vector of length M.
`betaLong_binary`	True coefficient for the binary covariate in the longitudinal submodel. Can be a scalar or a vector of length M.
`betaLong_continuous`	True coefficient for the continuous covariate in the longitudinal submodel. Can be a scalar or a vector of length M.
`betaLong_linear`	True coefficient for the fixed effect linear term in the longitudinal submodel when `fixed_trajectory = "linear"`, `fixed_trajectory = "quadratic"` or `fixed_trajectory = "cubic"`. Can be a scalar or a vector of length M.
`betaLong_quadratic`	True coefficient for the fixed effect quadratic term in the longitudinal submodel when `fixed_trajectory = "quadratic"` or `fixed_trajectory = "cubic"`. Can be a scalar or a vector of length M.
`betaLong_cubic`	True coefficient for the fixed effect cubic term in the longitudinal submodel when `fixed_trajectory = "cubic"`. Can be a scalar or a vector of length M.
`betaLong_aux`	True parameter value for the auxiliary parameter in the longitudinal submodel (sigma for Gaussian models, number of trials for binomial models, size for negative binomial models, shape for Gamma models, lambda for inverse Gaussian models). Can be a scalar or a vector of length M.
`betaEvent_intercept`	True intercept term (log hazard scale) in the event submodel. Only required for Weibull models.
`betaEvent_binary`	True coefficient (log hazard ratio) for the binary covariate in the event submodel.
`betaEvent_continuous`	True coefficient (log hazard ratio) for the continuous covariate in the event submodel.
`betaEvent_assoc`	True association parameter (log hazard ratio) in the event submodel. Can be a scalar or a vector of length M.
`betaEvent_aux`	True parameter value(s) for the auxiliary parameter(s) in the event submodel (shape parameter for Weibull models). Can be a scalar or a vector, depending on how many auxiliary parameters are required for specifying the baseline hazard.
`b_sd`	A list, with each element of the list containing the vector of standard deviations for the individual-level random effects.
`b_rho`	Correlation between the individual-level random effects. This is only relevant when there is a total of >1 individual-level random effects in the joint model. This can be specified as a scalar correlation term, which assumes the same (true) correlation between each of the individual-level random effects, or it can be a correlation matrix for the correlation between the individual-level random effects. When simulating data for a multivariate joint model, the structure of the correlation matrix is such that the first K_1 columns/rows correspond to the individual-level random effects for the first longitudinal submodel, and the next K_2 columns/rows correspond to the individual-level random effects for the second longitudinal submodel, and so on.
`prob_Z1`	Probability for the binary covariate included in each of the submodels.
`mean_Z2`	Mean of the (normally distributed) continuous covariate included in each of the submodels.
`sd_Z2`	Standard deviation of the (normally distributed) continuous covariate included in each of the submodels.
`max_yobs`	The maximum allowed number of longitudinal measurements for each biomarker. The actual number of observed measurements will depend on the individuals event time. Every individual is forced to have at least a baseline measurement for each biomarker (i.e. a biomarker measurement at time 0). The remaining biomarker measurement times will be uniformly distributed between 0 and `max_fuptime` if `balanced = FALSE`, or evenly spaced between 0 and `max_fuptime` if `balanced = TRUE`.
`max_fuptime`	The maximum follow up time in whatever the desired time units are. This time will also be used as the censoring time (i.e. for individuals who have a simulated survival time that is after `max_fuptime`).
`balanced`	A logical, specifying whether the timings of the longitudinal measurements should be balanced across individuals. If `FALSE` (the default), then each individual will have a baseline longitudinal measurement and the remaining measurement times will be chosen randomly from a uniform distribution on the range `[0,max_fuptime]`. If `TRUE`, then each individual will have a baseline longitudinal measurement and the remaining measurement times will be at common times for each individual, chosen to be evenly spaced between `[0,max_fuptime]`.
`family`	A family for the the longitudinal submodel, or for a multivariate joint model this can be a list of families. See `glm` and `family`.
`clust_control`	A named list providing the arguments that are necessary for simulating data where the first longitudinal submodel has lower level clustering within individuals. The named list should contain the following elements: L Integer specifying the maximum number of lower level units clustered within an individual. The actual number of lower level units clustered within each individual is taken to be a random uniform (truncated to an integer) on the range `[1,L+1]`. assoc Character string specifying the method for combining information across the lower level units clustered within an individual when forming the association structure. Can be `"sum"` for specifying which indicates the association structure should be based on a summation across the lower level units clustered within an individual. Can be `"mean"` which indicates that the association structure should be based on the mean (i.e. average) taken across the lower level units clustered within an individual. u_sd Numeric vector providing the standard deviations of the random effects at the cluster level. u_rho A scalar or a correlation matrix providing the correlation between the random effects for the lower level clustering factor. This is only relevant if there are >1 random effects for the lower level clustering factor. This is specified in a similar way as the `b_rho` argument described above. random_trajectory The desired type of trajectory in the random effects part of the longitudinal model at the cluster level. Can be `"none"` to only include a random intercept at the cluster level, or otherwise `"linear"` to include a random intercept and linear slope term.
`return_eta`	A logical, if `return_eta = TRUE`, then the simulated data will also include the value of longitudinal submodel's linear predictor evaluated at the various measurement times. These will be stored in the variables named `Xij_1`, `Xij_2`, etc. (Note that if `family = "gaussian"` then these are equivalent to the error-free values of the biomarker at the measurement times).
`seed`	An optional `seed`.
`interval`	The interval over which to search for the `uniroot` corresponding to each simulated event time.

The simjm function returns data simulated under a joint longitudinal and time-to-event model. The joint model can be univariate (i.e. one longitudinal outcome) or multivariate (i.e. more than one longitudinal outcome). If more than one longitudinal outcome is specified then the longitudinal outcomes are assumed to be correlated via a joint multivariate normal distribution for the individual-level random effects. Each longitudinal outcome may have a different family and link function. The time-to-event model is assumed to be a parametric proportional hazards model with the baseline hazard specified via the basehaz argument. The (log) hazard of the event is assumed to be related to each longitudinal outcome via the association structure specified in the assoc argument. A different association may be specified for linking each longitudinal outcome to the hazard of the event.

Note that by default simjm will simulate data for one longitudinal outcome (i.e. a univariate joint model). If we want to simulate data for a multivariate joint model, or we wish to use one of the non-default association structures (for example "etaslope"), then we may need to adjust the "true" parameters accordingly so that we still get realistic survival times (for example, by changing the "true" association parameter value).

A list of data frames, one for each longitudinal biomarker (in long format) and one for the event time data. The returned object also has a number of attributes, including a class attribute simjm and a number of other attributes that record the arguments that were specified in the simjm call; for example the "true" parameter values, the number of individuals, the number of longitudinal markers, and so on.

# Note that throughout the examples we specify 'n = 30' to
# ensure a small sample size so that the examples run quickly

# Simulate one longitudinal marker (we can just use the defaults):
simdat1 <- simjm(n = 30)

# Simulate two longitudinal markers and then check the
# true parameter values (stored as an attribute):
simdat2 <- simjm(M = 2, n = 30, betaEvent_assoc = 0.1)
attr(simdat2, "params")

# Simulate one longitudinal marker, using "etaslope"
# association structure:
simdat3 <- simjm(n = 30, assoc = "etaslope", betaEvent_assoc = 0.8)

# For one longitudinal marker, with a bernoulli outcome.
# (Note that 'betaLong_aux' specifies the number of trials
# for the binomial outcome).
simdat4 <- simjm(M = 1, n = 30,
                 betaLong_intercept = 1.3,
                 betaLong_binary = -0.6,
                 betaLong_continuous = -0.03,
                 betaLong_linear = 0.05,
                 b_sd = c(1, 0.05),
                 betaEvent_intercept = -9,
                 betaEvent_assoc = 0.3,
                 family = binomial(),
                 betaLong_aux = 1)

# For one longitudinal marker, with lower level clustering
# within individuals. The model includes only a random intercept
# at the individual-level, and then a random intercept and
# random slope at the lower cluster level. The association
# structure is based on a summation of the expected values for
# each of the lower level units.
simdat5 <- simjm(M = 1, n = 30,
                 fixed_trajectory = "linear",
                 random_trajectory = "none",
                 betaLong_intercept = -1,
                 b_sd = 1,
                 clust_control = list(
                   L = 6,
                   assoc = "sum",
                   random_trajectory = "linear",
                   u_sd = c(1,1),
                   u_rho = 0.2
                 ))

# Simulate two longitudinal markers, where the first longitudinal
# marker has a linear trajectory, and the second marker has
# quadratic trajectory. We will specify a correlation matrix for
# the individual-level random effects (via the 'b_rho' argument).
corrmat <- matrix(c(
  1.0, 0.2, 0.5, 0.0, 0.0,
  0.2, 1.0, 0.0, 0.2, 0.0,
  0.5, 0.0, 1.0, 0.3, 0.0,
  0.0, 0.2, 0.3, 1.0, -.1,
  0.0, 0.0, 0.0, -.1, 1.0), ncol = 5)
simdat6 <- simjm(M = 2, n = 30,
                 fixed_trajectory = c("linear", "quadratic"),
                 random_trajectory = c("linear", "quadratic"),
                 b_sd = c(2, 1, 2, 1, 0.5),
                 b_rho = corrmat,
                 betaEvent_assoc = c(0.1, 0.2))