mvrnorm_sim: Simulate Microbiome Longitudinal Data from Multivariate...
In microbiomeDASim: Microbiome Differential Abundance Simulation

Description Usage Arguments Value Examples

This function is used in the gen_norm_microbiome call when the user specified the method as mvrnorm.

mvrnorm_sim(
  n_control,
  n_treat,
  control_mean,
  sigma,
  num_timepoints,
  t_interval,
  rho,
  corr_str = c("ar1", "compound", "ind"),
  func_form = c("linear", "quadratic", "cubic", "M", "W", "L_up", "L_down"),
  beta,
  IP = NULL,
  missing_pct,
  missing_per_subject,
  miss_val = NA,
  dis_plot = FALSE,
  plot_trend = FALSE,
  zero_trunc = TRUE,
  asynch_time = FALSE
)

`n_control`	integer value specifying the number of control individuals
`n_treat`	integer value specifying the number of treated individuals
`control_mean`	numeric value specifying the mean value for control subjects. all control subjects are assummed to have the same population mean value.
`sigma`	numeric value specifying the global population standard deviation for both control and treated individuals.
`num_timepoints`	either an integer value specifying the number of timepoints per subject or a vector of timepoints for each subject. If supplying a vector the lenght of the vector must equal the total number of subjects.
`t_interval`	numeric vector of length two specifying the interval of time from which to draw observatoins [t_1, t_q]. Assumed to be equally spaced over the interval unless `asynch_time` is set to TRUE.
`rho`	value for the correlation parameter. must be between [0, 1]. see `mvrnorm_corr_gen` for details.
`corr_str`	correlation structure selected. see `mvrnorm_corr_gen` for details.
`func_form`	character value specifying the functional form for the longitduinal mean trend. see `mean_trend` for details.
`beta`	vector value specifying the parameters for the differential abundance function. see `mean_trend` for details.
`IP`	vector specifying any inflection points. depends on the type of functional form specified. see `mean_trend` for details. by default this is set to NULL.
`missing_pct`	numeric value that must be between [0, \1] that specifies what percentage of the individuals will have missing values.
`missing_per_subject`	integer value specifying how many observations per subject should be dropped. note that we assume that all individuals must have baseline value, meaning that the maximum number of `missing_per_subject` is equal to `num_timepoints` - 1.
`miss_val`	value used to induce missingness from the simulated data. by default missing values are assummed to be NA but other common choices include 0.
`dis_plot`	logical argument on whether to plot the simulated data or not. by default plotting is turned off.
`plot_trend`	specifies whether to plot the true mean trend. see `mean_trend` for details.
`zero_trunc`	logical indicator designating whether simulated outcomes should be zero truncated. default is set to TRUE
`asynch_time`	logical indicator designed to randomly sample timepoints over a specified interval if set to TRUE. default is FALSE.

This function returns a list with the following objects:

df - data.frame object with complete outcome Y, subject ID, time, group, and outcome with missing data

Y - vector of complete outcome

Mu - vector of complete mean specifications used during simulation

Sigma - block diagonal symmetric matrix of complete data used during simulation

N - total number of observations

miss_data - data.frame object that lists which ID's and timepoints were randomly selected to induce missingness

Y_obs - vector of outcome with induced missingness

num_subjects_per_group <- 20
sim_obj <- mvrnorm_sim(n_control=num_subjects_per_group,
                       n_treat=num_subjects_per_group,
                       control_mean=5, sigma=1, num_timepoints=5,
                       t_interval=c(0, 4), rho=0.95, corr_str='ar1',
                       func_form='linear', beta=c(0, 0.25),
                       missing_pct=0.6, missing_per_subject=2)
#checking the output
head(sim_obj$df)

#total number of observations is 2(num_subjects_per_group)(num_timeponts)
sim_obj$N

#there should be approximately 60% of the IDs with missing observations
length(unique(sim_obj$miss_data$miss_id))/length(unique(sim_obj$df$ID))

#checking the subject covariance structure
sim_obj$Sigma[seq_len(5), seq_len(5)]