simulate_data: Generate data
In rbmi: Reference Based Multiple Imputation

simulate_data

R Documentation

Generate data

Description

Generate data for a two-arms clinical trial with longitudinal continuous outcome and two intercurrent events (ICEs). ICE1 may be thought of as a discontinuation from study treatment due to study drug or condition related (SDCR) reasons. ICE2 may be thought of as discontinuation from study treatment due to uninformative study drop-out, i.e. due to not study drug or condition related (NSDRC) reasons and outcome data after ICE2 is always missing.

Usage

simulate_data(pars_c, pars_t, post_ice1_traj, strategies = getStrategies())

Arguments

`pars_c`	A `simul_pars` object as generated by `set_simul_pars()`. It specifies the simulation parameters of the control arm.
`pars_t`	A `simul_pars` object as generated by `set_simul_pars()`. It specifies the simulation parameters of the treatment arm.
`post_ice1_traj`	A string which specifies how observed outcomes occurring after ICE1 are simulated. Must target a function included in `strategies`. Possible choices are: Missing At Random `"MAR"`, Jump to Reference `"JR"`, Copy Reference `"CR"`, Copy Increments in Reference `"CIR"`, Last Mean Carried Forward `"LMCF"`. User-defined strategies could also be added. See `getStrategies()` for details.
`strategies`	A named list of functions. Default equal to `getStrategies()`. See `getStrategies()` for details.

Details

The data generation works as follows:

Generate outcome data for all visits (including baseline) from a multivariate normal distribution with parameters pars_c$mu and pars_c$sigma for the control arm and parameters pars_t$mu and pars_t$sigma for the treatment arm, respectively. Note that for a randomized trial, outcomes have the same distribution at baseline in both treatment groups, i.e. one should set pars_c$mu[1]=pars_t$mu[1] and pars_c$sigma[1,1]=pars_t$sigma[1,1].
Simulate whether ICE1 (study treatment discontinuation due to SDCR reasons) occurs after each visit according to parameters pars_c$prob_ice1 and pars_c$or_outcome_ice1 for the control arm and pars_t$prob_ice1 and pars_t$or_outcome_ice1 for the treatment arm, respectively.
Simulate drop-out following ICE1 according to pars_c$prob_post_ice1_dropout and pars_t$prob_post_ice1_dropout.
Simulate an additional uninformative study drop-out with probabilities pars_c$prob_ice2 and pars_t$prob_ice2 at each visit. This generates a second intercurrent event ICE2, which may be thought as treatment discontinuation due to NSDRC reasons with subsequent drop-out. The simulated time of drop-out is the subject's first visit which is affected by drop-out and data from this visit and all subsequent visits are consequently set to missing. If for a subject, both ICE1 and ICE2 are simulated to occur, then it is assumed that only the earlier of them counts. In case both ICEs are simulated to occur at the same time, it is assumed that ICE1 counts. This means that a single subject can experience either ICE1 or ICE2, but not both of them.
Adjust trajectories after ICE1 according to the given assumption expressed with the post_ice1_traj argument. Note that only post-ICE1 outcomes in the intervention arm can be adjusted. Post-ICE1 outcomes from the control arm are not adjusted.
Simulate additional intermittent missing outcome data as per arguments pars_c$prob_miss and pars_t$prob_miss.

The probability of the ICE after each visit is modeled according to the following logistic regression model: ~ 1 + I(visit == 0) + ... + I(visit == n_visits-1) + I((x-alpha)) where:

n_visits is the number of visits (including baseline).
alpha is the baseline outcome mean. The term I((x-alpha)) specifies the dependency of the probability of the ICE on the current outcome value. The corresponding regression coefficients of the logistic model are defined as follows: The intercept is set to 0, the coefficients corresponding to discontinuation after each visit for a subject with outcome equal to the mean at baseline are set according to parameters pars_c$prob_ice1 (pars_t$prob_ice1), and the regression coefficient associated with the covariate I((x-alpha)) is set to log(pars_c$or_outcome_ice1) (log(pars_t$or_outcome_ice1)).

Please note that the baseline outcome cannot be missing nor be affected by any ICEs.

Value

A data.frame containing the simulated data. It includes the following variables:

id: Factor variable that specifies the id of each subject.
visit: Factor variable that specifies the visit of each assessment. Visit 0 denotes the baseline visit.
group: Factor variable that specifies which treatment group each subject belongs to.
outcome_bl: Numeric variable that specifies the baseline outcome.
outcome_noICE: Numeric variable that specifies the longitudinal outcome assuming no ICEs.
ind_ice1: Binary variable that takes value 1 if the corresponding visit is affected by ICE1 and 0 otherwise.
dropout_ice1: Binary variable that takes value 1 if the corresponding visit is affected by the drop-out following ICE1 and 0 otherwise.
ind_ice2: Binary variable that takes value 1 if the corresponding visit is affected by ICE2.
outcome: Numeric variable that specifies the longitudinal outcome including ICE1, ICE2 and the intermittent missing values.