Mlearners: M-Learners
In forestry-labs/causalToolbox: Toolbox for Causal Inference with emphasize on Heterogeneous Treatment Effect Estimator

M-Learner

R Documentation

M-Learners

Description

M_RF is an implementation of the Modified Outcome Estimator with Random Forest (Breiman 2001) as the base learner.

M_BART is an implementation of the Modified Outcome Estimator with Bayesian Additive Regression Trees (Chipman et al. 2010) as the base learner.

Usage

M_RF(
  feat,
  tr,
  yobs,
  nthread = 0,
  verbose = FALSE,
  mu.forestry = list(relevant.Variable = 1:ncol(feat), ntree = 1000, replace = TRUE,
    sample.fraction = 0.8, mtry = round(ncol(feat) * 13/20), nodesizeSpl = 2, nodesizeAvg
    = 1, splitratio = 1, middleSplit = TRUE),
  e.forestry = list(relevant.Variable = 1:ncol(feat), ntree = 500, replace = TRUE,
    sample.fraction = 0.5, mtry = ncol(feat), nodesizeSpl = 11, nodesizeAvg = 33,
    splitratio = 0.5, middleSplit = FALSE),
  tau.forestry = list(relevant.Variable = 1:ncol(feat), ntree = 1000, replace = TRUE,
    sample.fraction = 0.7, mtry = round(ncol(feat) * 17/20), nodesizeSpl = 5, nodesizeAvg
    = 6, splitratio = 0.8, middleSplit = TRUE)
)

M_BART(
  feat,
  tr,
  yobs,
  ndpost = 1200,
  ntree = 200,
  nthread = 1,
  mu.BART = list(sparse = FALSE, theta = 0, omega = 1, a = 0.5, b = 1, augment = FALSE,
    rho = NULL, usequants = FALSE, cont = FALSE, sigest = NA, sigdf = 3, sigquant = 0.9,
    k = 2, power = 2, base = 0.95, sigmaf = NA, lambda = NA, numcut = 100L, nskip = 100L),
  e.BART = list(sparse = FALSE, theta = 0, omega = 1, a = 0.5, b = 1, augment = FALSE,
    rho = NULL, usequants = FALSE, cont = FALSE, sigest = NA, sigdf = 3, sigquant = 0.9,
    k = 2, power = 2, base = 0.95, sigmaf = NA, lambda = NA, numcut = 100L, nskip = 100L),
  tau.BART = list(sparse = FALSE, theta = 0, omega = 1, a = 0.5, b = 1, augment = FALSE,
    rho = NULL, usequants = FALSE, cont = FALSE, sigest = NA, sigdf = 3, sigquant = 0.9,
    k = 2, power = 2, base = 0.95, sigmaf = NA, lambda = NA, numcut = 100L, nskip = 100L)
)

Arguments

`feat`	A data frame containing the features.
`tr`	A numeric vector with 0 for control and 1 for treated variables.
`yobs`	A numeric vector containing the observed outcomes.
`nthread`	Number of threads which should be used to work in parallel.
`verbose`	TRUE for detailed output, FALSE for no output.
`mu.forestry, tau.forestry, e.forestry`	A list containing the hyperparameters for the `forestry` package that are used for estimating the response functions, the CATE, and the propensity score. These hyperparameters are passed to the `forestry` package. (Please refer to the forestry package for a more detailed documentation of the hyperparamters.) `relevant.Variable` Variables that are only used in the first stage. `ntree` Numbers of trees used in the first stage. `replace` Sample with or without replacement in the first stage. `sample.fraction` Size of total samples drawn for the training data in the first stage. `mtry` Number of variables randomly selected in each splitting point. `nodesizeSpl` Minimum nodesize in the first stage for the observations in the splitting set. (See the details of the `forestry` package) `nodesizeAvg` Minimum nodesize in the first stage for the observations in the averaging set. `splitratio` Proportion of the training data used as the splitting dataset in the first stage. `middleSplit` If true, the split value will be exactly in the middle of two observations. Otherwise, it will take a point based on a uniform distribution between the two observations.
`ndpost`	Number of posterior draws.
`ntree`	Number of trees.
`mu.BART, e.BART, tau.BART`	Hyperparameters of the BART functions for the control and treated group. (Use `?BART::mc.wbart` for a detailed explanation of their effects.)

Details

The M-Learner estimates the CATE in two steps:

Estimate the response functions and the propensity score,

μ_0(x) = E[Y(0) | X = x]

μ_1(x) = E[Y(1) | X = x]

e(x) = E[W | X = x]

using the base learner and denote the estimates as \hat μ_0, \hat μ_1, and \hat e.
Define the adjusted modified outcomes as

R _i = (Z_i - \hat e(x_i)) / (\hat e(x_i)[1 - \hat e(x_i)]) (Y_i - \hat μ_1(x_i) [1 - \hat e(x_i)] - \hat μ_0(x_i)\hat e(x_i)).

Now employ the base learner to estimate

τ(x) = E[R | X = x].

The result is the CATE estimator.

Value

An object from a class that contains the CATEestimator class. It should be used with one of the following functions: EstimateCATE, CateCI, and CateBIAS. The object has at least the following slots:

`feature_train`	A copy of feat.
`tr_train`	A copy of tr.
`yobs_train`	A copy of yobs.
`creator`	Function call that creates the CATE estimator. This is used for different bootstrap procedures.

Author(s)

Soeren R. Kuenzel

References

Sören Künzel, Jasjeet Sekhon, Peter Bickel, and Bin Yu (2017). MetaLearners for Estimating Heterogeneous Treatment Effects Using Machine Learning. https://www.pnas.org/content/116/10/4156
Sören Künzel, Simon Walter, and Jasjeet Sekhon (2018). Causaltoolbox—Estimator Stability for Heterogeneous Treatment Effects. https://arxiv.org/pdf/1811.02833.pdf
Daniel Rubin and Mark J van der Laan (2007). A Doubly Robust Censoring Unbiased Transformation. https://www.ncbi.nlm.nih.gov/pubmed/22550646

Examples

require(causalToolbox)

# create example data set
simulated_experiment <- simulate_causal_experiment(
  ntrain = 1000,
  ntest = 1000,
  dim = 10
)
feat <- simulated_experiment$feat_tr
tr <- simulated_experiment$W_tr
yobs <- simulated_experiment$Yobs_tr
feature_test <- simulated_experiment$feat_te

# create the CATE estimator using Random Forests (RF)
xl_rf <- X_RF(feat = feat, tr = tr, yobs = yobs)
tl_rf <- T_RF(feat = feat, tr = tr, yobs = yobs)
sl_rf <- S_RF(feat = feat, tr = tr, yobs = yobs)
ml_rf <- M_RF(feat = feat, tr = tr, yobs = yobs)
xl_bt <- X_BART(feat = feat, tr = tr, yobs = yobs)
tl_bt <- T_BART(feat = feat, tr = tr, yobs = yobs)
sl_bt <- S_BART(feat = feat, tr = tr, yobs = yobs)
ml_bt <- M_BART(feat = feat, tr = tr, yobs = yobs)
  
cate_esti_xrf <- EstimateCate(xl_rf, feature_test)

# evaluate the performance.
cate_true <- simulated_experiment$tau_te
mean((cate_esti_xrf - cate_true) ^ 2)
## Not run: 
# create confidence intervals via bootstrapping. 
xl_ci_rf <- CateCI(xl_rf, feature_test, B = 500)

## End(Not run)

forestry-labs/causalToolbox documentation built on Feb. 6, 2023, 11:27 p.m.