causalExp: Simulate a Causal Experiment
In forestry-labs/causalToolbox: Toolbox for Causal Inference with emphasize on Heterogeneous Treatment Effect Estimator

simulate Experiments

R Documentation

Simulate a Causal Experiment

Description

simulate_correlation_matrix uses the C-vine method for simulating correlation matrices. (Refer to the referenced paper for details.)

simulate_causal_experiment simulates an RCT or observational data for causal effect estimation. It is mainly used to test different heterogenuous treatment effect estimation strategies.

Usage

simulate_correlation_matrix(dim, alpha)

simulate_causal_experiment(
  ntrain = nrow(given_features),
  ntest = nrow(given_features),
  dim = ncol(given_features),
  alpha = 0.1,
  feat_distribution = "normal",
  given_features = NULL,
  pscore = "rct5",
  mu0 = "sparseLinearStrong",
  tau = "sparseLinearWeak",
  testseed = NULL,
  trainseed = NULL,
  noEffect = FALSE
)

pscores.simulate_causal_experiment

mu0.simulate_causal_experiment

tau.simulate_causal_experiment

Arguments

`dim`	Dimension of the data set.
`alpha`	Only used if `given_features` is not set and `feat_distribution` is chosen to be normal. It specifies how correlated the features can be. If alpha = 0, then the features are independent. If alpha is very large, then the features can be very correlated. Use the `simulate_correlation_matrix` function to get a better understanding of the impact of alpha.
`ntrain`	Number of training examples.
`ntest`	Number of test examples.
`feat_distribution`	Only used if `given_features` is not specified. Either "normal" or "unif." It specifies the distribution of the features.
`given_features`	This is used if we already have features and want to test the performance of different estimators for a particular set of features.
`pscore, mu0, tau`	Parameters that determine the propensity score, the response function for the control units, and tau, respectively. The different options can be seen using `names(pscores.simulate_causal_experiment)`, `names(mu0.simulate_causal_experiment)`, and `names(tau.simulate_causal_experiment)`. This is implemented in this manner, because it enables the user to easily loop through the different estimators.
`testseed`	The seed used to generate the test data. If NULL, then the seed of the main session is used.
`trainseed`	The seed used to generate the training data. If NULL, then the seed of the main session is used.
`noEffect`	Boolean flag to specify whether the experiment has no true effect. When this is set to TRUE, the U_0 function will be used for both control and treated observations. When this is false, the U_0 function will be used for control observations, and the U_1 functions will be used for treated observations. Default is FALSE.

Details

The function simulates causal experiments by generating the features, treatment assignment, observed Y values, and CATE for a test set and a training set. pscore, mu0, and tau define the response functions and the propensity score. For example, pscore = "osSparse1Linear" specifies that

e(x) = max(0.05, min(.95, x1 / 2 + 1 / 4))

and mu0 ="sparseLinearWeak" specifies that the response function for the control units is given by the simple linear function,

mu0(x) = 3 x1 + 5 x2.

Value

A correlation matrix.

A list with the following elements:

`setup_name`	Name of the setup.
`m_t_truth`	Function containing the response function of the treated units.
`m_c_truth`	Function containing the response function of the control units.
`propscore`	Propensity score function.
`alpha`	Chosen alpha.
`feat_te`	Data.frame containing the features of the test samples.
`W_te`	Numeric vector containing the treatment assignment of the test samples.
`tau_te`	Numeric vector containing the true conditional average treatment effects of the test samples.
`Yobs_te`	Numeric vector containing the observed Y values of the test samples.
`feat_tr`	Data.frame containing the features of the training samples.
`W_tr`	Numeric vector containing the treatment assignment of the training samples.
`tau_tr`	Numeric vector containing the true conditional average treatment effects of the training samples.
`Yobs_tr`	Numeric vector containing the observed Y values of the training samples.

References

Daniel Lewandowskia, Dorota Kurowickaa, Harry Joe (2009). Generating Random Correlation Matrices Based on Vines and Extended Onion Method.
Sören Künzel, Jasjeet Sekhon, Peter Bickel, and Bin Yu (2017). Meta-learners for Estimating Heterogeneous Treatment Effects Using Machine Learning.

Examples

require(causalToolbox)

ce_sim <- simulate_causal_experiment(
  ntrain = 20,
  ntest = 20,
  dim = 7
)

ce_sim

## Not run: 
estimators <- list(
  S_RF = S_RF,
  T_RF = T_RF,
  X_RF = X_RF,
  S_BART = S_BART,
  T_BART = T_BART,
  X_BARTT = X_BART)

performance <- data.frame()
for(tau_n in names(tau.simulate_causal_experiment)){
  for(mu0_n in names(mu0.simulate_causal_experiment)) {
    ce <- simulate_causal_experiment(
      given_features = iris,
      pscore = "rct5",
      mu0 = mu0_n,
      tau = tau_n)

    for(estimator_n in names(estimators)) {
      print(paste(tau_n, mu0_n, estimator_n))

      trained_e <- estimators[[estimator_n]](ce$feat_tr, ce$W_tr, ce$Yobs_tr)
      performance <-
        rbind(performance,
              data.frame(
                mu0 = mu0_n,
                tau = tau_n,
                estimator = estimator_n,
                MSE = mean((EstimateCate(trained_e, ce$feat_te) -
                            ce$tau_te)^2)))
    }
  }
}

reshape2::dcast(data = performance, mu0 + tau ~ estimator)

## End(Not run)

forestry-labs/causalToolbox documentation built on Feb. 6, 2023, 11:27 p.m.