mvrnorm_sim: Simulate Microbiome Longitudinal Data from Multivariate...

Description Usage Arguments Value Examples

View source: R/mvrnorm_sim.R

Description

This function is used in the gen_norm_microbiome call when the user specified the method as mvrnorm.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
mvrnorm_sim(
  n_control,
  n_treat,
  control_mean,
  sigma,
  num_timepoints,
  t_interval,
  rho,
  corr_str = c("ar1", "compound", "ind"),
  func_form = c("linear", "quadratic", "cubic", "M", "W", "L_up", "L_down"),
  beta,
  IP = NULL,
  missing_pct,
  missing_per_subject,
  miss_val = NA,
  dis_plot = FALSE,
  plot_trend = FALSE,
  zero_trunc = TRUE,
  asynch_time = FALSE
)

Arguments

n_control

integer value specifying the number of control individuals

n_treat

integer value specifying the number of treated individuals

control_mean

numeric value specifying the mean value for control subjects. all control subjects are assummed to have the same population mean value.

sigma

numeric value specifying the global population standard deviation for both control and treated individuals.

num_timepoints

either an integer value specifying the number of timepoints per subject or a vector of timepoints for each subject. If supplying a vector the lenght of the vector must equal the total number of subjects.

t_interval

numeric vector of length two specifying the interval of time from which to draw observatoins [t_1, t_q]. Assumed to be equally spaced over the interval unless asynch_time is set to TRUE.

rho

value for the correlation parameter. must be between [0, 1]. see mvrnorm_corr_gen for details.

corr_str

correlation structure selected. see mvrnorm_corr_gen for details.

func_form

character value specifying the functional form for the longitduinal mean trend. see mean_trend for details.

beta

vector value specifying the parameters for the differential abundance function. see mean_trend for details.

IP

vector specifying any inflection points. depends on the type of functional form specified. see mean_trend for details. by default this is set to NULL.

missing_pct

numeric value that must be between [0, \1] that specifies what percentage of the individuals will have missing values.

missing_per_subject

integer value specifying how many observations per subject should be dropped. note that we assume that all individuals must have baseline value, meaning that the maximum number of missing_per_subject is equal to num_timepoints - 1.

miss_val

value used to induce missingness from the simulated data. by default missing values are assummed to be NA but other common choices include 0.

dis_plot

logical argument on whether to plot the simulated data or not. by default plotting is turned off.

plot_trend

specifies whether to plot the true mean trend. see mean_trend for details.

zero_trunc

logical indicator designating whether simulated outcomes should be zero truncated. default is set to TRUE

asynch_time

logical indicator designed to randomly sample timepoints over a specified interval if set to TRUE. default is FALSE.

Value

This function returns a list with the following objects:

df - data.frame object with complete outcome Y, subject ID, time, group, and outcome with missing data

Y - vector of complete outcome

Mu - vector of complete mean specifications used during simulation

Sigma - block diagonal symmetric matrix of complete data used during simulation

N - total number of observations

miss_data - data.frame object that lists which ID's and timepoints were randomly selected to induce missingness

Y_obs - vector of outcome with induced missingness

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
num_subjects_per_group <- 20
sim_obj <- mvrnorm_sim(n_control=num_subjects_per_group,
                       n_treat=num_subjects_per_group,
                       control_mean=5, sigma=1, num_timepoints=5,
                       t_interval=c(0, 4), rho=0.95, corr_str='ar1',
                       func_form='linear', beta=c(0, 0.25),
                       missing_pct=0.6, missing_per_subject=2)
#checking the output
head(sim_obj$df)

#total number of observations is 2(num_subjects_per_group)(num_timeponts)
sim_obj$N

#there should be approximately 60% of the IDs with missing observations
length(unique(sim_obj$miss_data$miss_id))/length(unique(sim_obj$df$ID))

#checking the subject covariance structure
sim_obj$Sigma[seq_len(5), seq_len(5)]

microbiomeDASim documentation built on Nov. 8, 2020, 10:58 p.m.