proxy_BCA: Baseline Conditional Average
In GenericML: Generic Machine Learning Inference

proxy_BCA

R Documentation

Baseline Conditional Average

Description

Proxy estimation of the Baseline Conditional Average (BCA), defined by E[Y | D=0, Z]. Estimation is done on the auxiliary sample, but BCA predictions are made for all observations.

Usage

proxy_BCA(Z, D, Y, A_set, learner, min_variation = 1e-05)

Arguments

`Z`	A numeric design matrix that holds the covariates in its columns.
`D`	A binary vector of treatment assignment. Value one denotes assignment to the treatment group and value zero assignment to the control group.
`Y`	A numeric vector containing the response variable.
`A_set`	A numerical vector of the indices of the observations in the auxiliary sample.
`learner`	A string specifying the machine learner for the estimation. Either `'lasso'`, `'random_forest'`, `'tree'`, or a custom learner specified with `mlr3` syntax. In the latter case, do not specify in the `mlr3` syntax specification if the learner is a regression learner or classification learner. Example: `'mlr3::lrn("ranger", num.trees = 100)'` for a random forest learner with 100 trees. Note that this is a string and the absence of the `classif.` or `regr.` keywords. See https://mlr3learners.mlr-org.com for a list of `mlr3` learners.
`min_variation`	Specifies a threshold for the minimum variation of the predictions. If the variation of a BCA prediction falls below this threshold, random noise with distribution N(0, var(Y)/20) is added to it. Default is `1e-05`.

Details

The specifications "lasso", "random_forest", and "tree" in learner correspond to the following mlr3 specifications (we omit the keywords classif. and regr.). "lasso" is a cross-validated Lasso estimator, which corresponds to 'mlr3::lrn("cv_glmnet", s = "lambda.min", alpha = 1)'. "random_forest" is a random forest with 500 trees, which corresponds to 'mlr3::lrn("ranger", num.trees = 500)'. "tree" is a tree learner, which corresponds to 'mlr3::lrn("rpart")'.

Value

An object of class "proxy_BCA", consisting of the following components:

estimates: A numeric vector of BCA estimates of each observation.
mlr3_objects: "mlr3" objects used for estimation.

References

Chernozhukov V., Demirer M., Duflo E., Fernández-Val I. (2020). “Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments.” arXiv preprint arXiv:1712.04802. URL: https://arxiv.org/abs/1712.04802.

Lang M., Binder M., Richter J., Schratz P., Pfisterer F., Coors S., Au Q., Casalicchio G., Kotthoff L., Bischl B. (2019). “mlr3: A Modern Object-Oriented Machine Learning Framework in R.” Journal of Open Source Software, 4(44), 1903. doi: 10.21105/joss.01903.

Examples

if(require("ranger")){
## generate data
set.seed(1)
n  <- 150                                  # number of observations
p  <- 5                                    # number of covariates
D  <- rbinom(n, 1, 0.5)                    # random treatment assignment
Z  <- matrix(runif(n*p), n, p)             # design matrix
Y0 <- as.numeric(Z %*% rexp(p) + rnorm(n)) # potential outcome without treatment
Y1 <- 2 + Y0                               # potential outcome under treatment
Y  <- ifelse(D == 1, Y1, Y0)               # observed outcome
A_set <- sample(1:n, size = n/2)           # auxiliary set

## BCA predictions via random forest
proxy_BCA(Z, D, Y, A_set, learner = "mlr3::lrn('ranger', num.trees = 10)")
}

GenericML documentation built on June 18, 2022, 9:09 a.m.