generate_causal_data: Generate causal forest data

View source: R/dgps.R

generate_causal_dataR Documentation

Generate causal forest data

Description

The following DGPs are available for benchmarking purposes:

  • "simple": tau = max(X1, 0), e = 0.4 + 0.2 * 1X1 > 0.

  • "aw1": equation (27) of https://arxiv.org/pdf/1510.04342.pdf

  • "aw2": equation (28) of https://arxiv.org/pdf/1510.04342.pdf

  • "aw3": confounding is from "aw1" and tau is from "aw2"

  • "aw3reverse": Same as aw3, but HTEs anticorrelated with baseline

  • "ai1": "Setup 1" from section 6 of https://arxiv.org/pdf/1504.01132.pdf

  • "ai2": "Setup 2" from section 6 of https://arxiv.org/pdf/1504.01132.pdf

  • "kunzel": "Simulation 1" from A.1 in https://arxiv.org/pdf/1706.03461.pdf

  • "nw1": "Setup A" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf

  • "nw2": "Setup B" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf

  • "nw3": "Setup C" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf

  • "nw4": "Setup D" from Section 4 of https://arxiv.org/pdf/1712.04912.pdf

Usage

generate_causal_data(
  n,
  p,
  sigma.m = 1,
  sigma.tau = 0.1,
  sigma.noise = 1,
  dgp = c("simple", "aw1", "aw2", "aw3", "aw3reverse", "ai1", "ai2", "kunzel", "nw1",
    "nw2", "nw3", "nw4")
)

Arguments

n

The number of observations.

p

The number of covariates (note: the minimum varies by DGP).

sigma.m

The standard deviation of the unconditional mean of Y. Default is 1.

sigma.tau

The standard deviation of the treatment effect. Default is 0.1.

sigma.noise

The conditional variance of Y. Default is 1.

dgp

The kind of dgp. Default is "simple".

Details

Each DGP is parameterized by X: observables, m: conditional mean of Y, tau: treatment effect, e: propensity scores, V: conditional variance of Y.

The following rescaled data is returned m = m / sd(m) * sigma.m, tau = tau / sd(tau) * sigma.tau, V = V / mean(V) * sigma.noise^2, W = rbinom(e), Y = m + (W - e) * tau + sqrt(V) + rnorm(n).

Value

A list consisting of: X, Y, W, tau, m, e, dgp.

Examples


# Generate simple benchmark data
data <- generate_causal_data(100, 5, dgp = "simple")
# Generate data from Wager and Athey (2018)
data <- generate_causal_data(100, 5, dgp = "aw1")
data2 <- generate_causal_data(100, 5, dgp = "aw2")


grf documentation built on Oct. 1, 2023, 1:07 a.m.