data_sim: Simulate data for binary outcome with multiple treatments
In CIMTx: Causal Inference for Multiple Treatments with a Binary Outcome

data_sim

R Documentation

Simulate data for binary outcome with multiple treatments

Description

The function data_sim simulate data for binary outcome with multiple treatments. Users can adjust the following 7 design factors: (1) sample size, (2) ratio of units across treatment groups, (3) whether the treatment assignment model and the outcome generating model are linear or nonlinear, (4) whether the covariates that best predict the treatment also predict the outcome well, (5) whether the response surfaces are parallel across treatment groups, (6) outcome prevalence, and (7) degree of covariate overlap.

Usage

data_sim(
  sample_size,
  n_trt = 3,
  x = "rnorm(0, 1)",
  lp_y = rep("x1", 3),
  nlp_y = NULL,
  align = TRUE,
  tau = c(0, 0, 0),
  delta = c(0, 0),
  psi = 1,
  lp_w,
  nlp_w
)

Arguments

`sample_size`	A numeric value indicating the total number of units.
`n_trt`	A numeric value indicating the number of treatments. The default is set to 3.
`x`	A vector of characters representing covariates, with each covariate being generated from the standard probability. The default is set to "rnorm(0, 1)". `distributions` in the `stats` package.
`lp_y`	A vector of characters of length `n_trt`, representing the linear effects in the outcome generating model. The default is set to rep("x1", 3).
`nlp_y`	A vector of characters of length `n_trt`, representing the nonlinear effects in the outcome generating model. The default is set to NULL.
`align`	A logical indicating whether the predictors in the treatment assignment model are the same as the predictors for the outcome generating model. The default is `TRUE`. If the argument is set to `FALSE`, users need to specify additional two arguments `lp_w` and `nlp_w`.
`tau`	A numeric vector of length `n_trt` inducing different outcome event probabilities across treatment groups. Higher values mean higher outcome event probability for the treatment group; lower values mean lower outcome event probability for the treatment group. The default is set to c(0, 0, 0), which corresponds to an approximately equal outcome event probability across three treatment groups.
`delta`	A numeric vector of length `n_trt`-1 inducing different ratio of units across treatment groups. Higher values mean higher proportion for the treatment group; lower values mean lower proportion for the treatment group. The default is set to c(0,0), which corresponds to an approximately equal sample sizes across three treatment groups.
`psi`	A numeric value for the parameter governing the sparsity of covariate overlap. Higher values mean weaker covariate overlap; lower values mean stronger covariate overlap. The default is set to 1, which corresponds to a moderate covariate overlap.
`lp_w`	is a vector of characters of length `n_trt` - 1, representing in the treatment assignment model
`nlp_w`	is a vector of characters of length `n_trt` - 1, representing in the treatment assignment model

Value

A list with 7 elements for simulated data. It contains

`covariates:`	x matrix
`w:`	treatment indicators
`y:`	observed binary outcomes
`y_prev:`	outcome prevalence rates
`ratio_of_units:`	the proportions of units in each treatment group
`overlap_fig:`	the visualization of covariate overlap via boxplots of the distributions of true GPS
`y_true:`	simulated true outcome in each treatment group

References

Hadley Wickham (2019). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.4.0. URL:https://CRAN.R-project.org/package=stringr

Hadley Wickham, Romain François, Lionel Henry and Kirill Müller (2021). dplyr: A Grammar of Data Manipulation. R package version 1.0.7. URL: https://CRAN.R-project.org/package=dplyr

Examples

library(CIMTx)
lp_w_all <-
  c(
    ".4*x1 + .1*x2  - .1*x4 + .1*x5", # w = 1
    ".2 * x1 + .2 * x2  - .2 * x4 - .3 * x5"
  ) # w = 2
nlp_w_all <-
  c(
    "-.5*x1*x4  - .1*x2*x5", # w = 1
    "-.3*x1*x4 + .2*x2*x5"
  ) # w = 2
lp_y_all <- rep(".2*x1 + .3*x2 - .1*x3 - .1*x4 - .2*x5", 3)
nlp_y_all <- rep(".7*x1*x1  - .1*x2*x3", 3)
X_all <- c(
  "rnorm(0, 0.5)", # x1
  "rbeta(2,0.4)", # x2
  "runif(0, 0.5)", # x3
  "rweibull(1,2)", # x4
  "rbinom(1,0.4)" # x5
)

set.seed(111111)
data <- data_sim(
  sample_size = 300,
  n_trt = 3,
  x = X_all,
  lp_y = lp_y_all,
  nlp_y = nlp_y_all,
  align = FALSE,
  lp_w = lp_w_all,
  nlp_w = nlp_w_all,
  tau = c(-1.5, 0, 1.5),
  delta = c(0.5, 0.5),
  psi = 1
)

CIMTx documentation built on June 24, 2022, 9:07 a.m.