DRleaners: DR-Learners

DR-LearnerR Documentation

DR-Learners

Description

DR_RF is an implementation of the DR-learner with Random Forests (Breiman 2001) as the base learners.

Usage

DR_RF(
  feat,
  tr,
  yobs,
  predmode = "propmean",
  nthread = 0,
  verbose = FALSE,
  trunc_level = 0.02,
  prop.forestry = list(relevant.Variable = 1:ncol(feat), ntree = 500, replace = TRUE,
    sample.fraction = 0.5, mtry = ncol(feat), nodesizeSpl = 11, nodesizeAvg = 33,
    nodesizeStrictSpl = 2, nodesizeStrictAvg = 1, splitratio = 1, middleSplit = FALSE,
    OOBhonest = TRUE),
  tau.forestry = list(relevant.Variable = 1:ncol(feat), ntree = 1000, replace = TRUE,
    sample.fraction = 0.7, mtry = round(ncol(feat) * 17/20), nodesizeSpl = 5, nodesizeAvg
    = 6, nodesizeStrictSpl = 3, nodesizeStrictAvg = 1, splitratio = 1, middleSplit =
    TRUE, OOBhonest = TRUE),
  mu.forestry = list(relevant.Variable = 1:ncol(feat), ntree = 1000, replace = TRUE,
    sample.fraction = 0.7, mtry = round(ncol(feat) * 17/20), nodesizeSpl = 5, nodesizeAvg
    = 6, nodesizeStrictSpl = 3, nodesizeStrictAvg = 1, splitratio = 1, middleSplit =
    TRUE, OOBhonest = TRUE),
  pseu.forestry = list(relevant.Variable = 1:ncol(feat), ntree = 1000, replace = TRUE,
    sample.fraction = 0.7, mtry = round(ncol(feat) * 17/20), nodesizeSpl = 5, nodesizeAvg
    = 6, nodesizeStrictSpl = 3, nodesizeStrictAvg = 1, splitratio = 1, middleSplit =
    TRUE, OOBhonest = TRUE)
)

Arguments

feat

A data frame containing the features.

tr

A numeric vector with 0 for control and 1 for treated variables.

yobs

A numeric vector containing the observed outcomes.

predmode

Specifies how the two estimators of the second stage should be aggregated. Possible types are "propmean," "control," and "treated." The default is "propmean," which refers to propensity score weighting.

nthread

Number of threads which should be used to work in parallel.

verbose

TRUE for detailed output, FALSE for no output.

trunc_level

Level at which to truncate the estimated propensity scores this ensures that the predicted propensity scores are bounded between trunc_level < p_score < 1-trunc_level. Default is .02.

prop.forestry, tau.forestry, mu.forestry, pseu.forestry

A list containing the hyperparameters for the Rforestry package that are used for estimating the response functions, the CATE, and the propensity score. These hyperparameters are passed to the Rforestry package. (Please refer to the Rforestry package for a more detailed documentation of the hyperparamters.)

  • relevant.Variable Variables that are only used in the first stage.

  • ntree Numbers of trees used in the first stage.

  • replace Sample with or without replacement in the first stage.

  • sample.fraction The size of total samples to draw for the training data in the first stage.

  • mtry The number of variables randomly selected in each splitting point.

  • nodesizeSpl Minimum nodesize in the first stage for the observations in the splitting set. (See the details of the forestry package)

  • nodesizeAvg Minimum nodesize in the first stage for the observations in the averaging set.

  • nodesizeStrictSpl Minimum nodesize in the first stage for the observations in the splitting set. (See the details of the forestry package)

  • nodesizeStrictAvg Minimum nodesize in the first stage for the observations in the averaging set.

  • splitratio Proportion of the training data used as the splitting dataset in the first stage.

  • middleSplit If true, the split value will be exactly in the middle of two observations. Otherwise, it will take a point based on a uniform distribution between the two observations.

  • OOBhonest If true, forestry object will use the Out of Bag honesty implemented in the Rforestry package.

Value

An object from a class that contains the CATEestimator class. It should be used with one of the following functions: EstimateCATE, CateCI, and CateBIAS. The object has at least the following slots:

feature_train

A copy of feat.

tr_train

A copy of tr.

yobs_train

A copy of yobs.

creator

Function call that creates the CATE estimator. This is used for different bootstrap procedures.

Author(s)

Soeren R. Kuenzel

References

See Also

Other metalearners: M-Learner, S-Learner, T-Learner, X-Learner

Examples

require(causalToolbox)

# create example data set
simulated_experiment <- simulate_causal_experiment(
  ntrain = 1000,
  ntest = 1000,
  dim = 10
)
feat <- simulated_experiment$feat_tr
tr <- simulated_experiment$W_tr
yobs <- simulated_experiment$Yobs_tr
feature_test <- simulated_experiment$feat_te

# create the CATE estimator using Random Forests (RF)
xl_rf <- X_RF(feat = feat, tr = tr, yobs = yobs)
tl_rf <- T_RF(feat = feat, tr = tr, yobs = yobs)
sl_rf <- S_RF(feat = feat, tr = tr, yobs = yobs)
ml_rf <- M_RF(feat = feat, tr = tr, yobs = yobs)
xl_bt <- X_BART(feat = feat, tr = tr, yobs = yobs)
tl_bt <- T_BART(feat = feat, tr = tr, yobs = yobs)
sl_bt <- S_BART(feat = feat, tr = tr, yobs = yobs)
ml_bt <- M_BART(feat = feat, tr = tr, yobs = yobs)

cate_esti_xrf <- EstimateCate(xl_rf, feature_test)

# evaluate the performance.
cate_true <- simulated_experiment$tau_te
mean((cate_esti_xrf - cate_true) ^ 2)
## Not run: 
# create confidence intervals via bootstrapping.
xl_ci_rf <- CateCI(xl_rf, feature_test, B = 500)

## End(Not run)

forestry-labs/causalToolbox documentation built on Feb. 6, 2023, 11:27 p.m.