GenericML_single: Single iteration of the GenericML algorithm
In GenericML: Generic Machine Learning Inference

GenericML_single

R Documentation

Single iteration of the GenericML algorithm

Description

Performs generic ML inference for a single learning technique and a given split of the data. Can be seen as a single iteration of Algorithm 1 in the paper.

Usage

GenericML_single(
  Z,
  D,
  Y,
  learner,
  propensity_scores,
  M_set,
  A_set = setdiff(1:length(Y), M_set),
  Z_CLAN = NULL,
  HT = FALSE,
  quantile_cutoffs = c(0.25, 0.5, 0.75),
  X1_BLP = setup_X1(),
  X1_GATES = setup_X1(),
  diff_GATES = setup_diff(),
  diff_CLAN = setup_diff(),
  vcov_BLP = setup_vcov(),
  vcov_GATES = setup_vcov(),
  equal_variances_CLAN = FALSE,
  significance_level = 0.05,
  min_variation = 1e-05
)

Arguments

`Z`	A numeric design matrix that holds the covariates in its columns.
`D`	A binary vector of treatment assignment. Value one denotes assignment to the treatment group and value zero assignment to the control group.
`Y`	A numeric vector containing the response variable.
`learner`	A character specifying the machine learner to be used for estimating the baseline conditional average (BCA) and conditional average treatment effect (CATE). Either `'lasso'`, `'random_forest'`, `'tree'`, or a custom learner specified with `mlr3` syntax. In the latter case, do not specify in the `mlr3` syntax specification if the learner is a regression learner or classification learner. Example: `'mlr3::lrn("ranger", num.trees = 100)'` for a random forest learner with 100 trees. Note that this is a string and the absence of the `classif.` or `regr.` keywords. See https://mlr3learners.mlr-org.com for a list of `mlr3` learners.
`propensity_scores`	A numeric vector of propensity score estimates.
`M_set`	A numerical vector of indices of observations in the main sample.
`A_set`	A numerical vector of indices of observations in the auxiliary sample. Default is complementary set to `M_set`.
`Z_CLAN`	A numeric matrix holding variables on which classification analysis (CLAN) shall be performed. CLAN will be performed on each column of the matrix. If `NULL` (default), then `Z_CLAN = Z`, i.e. CLAN is performed for all variables in `Z`.
`HT`	Logical. If `TRUE`, a Horvitz-Thompson (HT) transformation is applied in the BLP and GATES regressions. Default is `FALSE`.
`quantile_cutoffs`	The cutoff points of the quantiles that shall be used for GATES grouping. Default is `c(0.25, 0.5, 0.75)`, which corresponds to the four quartiles.
`X1_BLP`	Specifies the design matrix X_1 in the regression. Must be an object of class `"setup_X1"`. See the documentation of `setup_X1()` for details.
`X1_GATES`	Same as `X1_BLP`, just for the GATES regression.
`diff_GATES`	Specifies the generic targets of GATES. Must be an object of class `"setup_diff"`. See the documentation of `setup_diff()` for details.
`diff_CLAN`	Same as `diff_GATES`, just for the CLAN generic targets.
`vcov_BLP`	Specifies the covariance matrix estimator in the BLP regression. Must be an object of class `"setup_vcov"`. See the documentation of `setup_vcov()` for details.
`vcov_GATES`	Same as `vcov_BLP`, just for the GATES regression.
`equal_variances_CLAN`	Logical. If `TRUE`, then all within-group variances of the CLAN groups are assumed to be equal. Default is `FALSE`. This specification is required for heteroskedasticity-robust variance estimation on the difference of two CLAN generic targets (i.e. variance of the difference of two means). If `TRUE` (corresponds to homoskedasticity assumption), the pooled variance is used. If `FALSE` (heteroskedasticity), the variance of Welch's t-test is used.
`significance_level`	Significance level for VEIN. Default is 0.05.
`min_variation`	Specifies a threshold for the minimum variation of the BCA/CATE predictions. If the variation of a BCA/CATE prediction falls below this threshold, random noise with distribution N(0, var(Y)/20) is added to it. Default is `1e-05`.

Details

The specifications "lasso", "random_forest", and "tree" in learner correspond to the following mlr3 specifications (we omit the keywords classif. and regr.). "lasso" is a cross-validated Lasso estimator, which corresponds to 'mlr3::lrn("cv_glmnet", s = "lambda.min", alpha = 1)'. "random_forest" is a random forest with 500 trees, which corresponds to 'mlr3::lrn("ranger", num.trees = 500)'. "tree" is a tree learner, which corresponds to 'mlr3::lrn("rpart")'.

Value

A list with the following components:

BLP: An object of class "BLP".
GATES: An object of class "GATES".
CLAN: An object of class "CLAN".
proxy_BCA: An object of class "proxy_BCA".
proxy_CATE: An object of class "proxy_CATE".
best: Estimates of the Λ parameters for finding the best learner. Returned by lambda_parameters().

References

Chernozhukov V., Demirer M., Duflo E., Fernández-Val I. (2020). “Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments.” arXiv preprint arXiv:1712.04802. URL: https://arxiv.org/abs/1712.04802.

Lang M., Binder M., Richter J., Schratz P., Pfisterer F., Coors S., Au Q., Casalicchio G., Kotthoff L., Bischl B. (2019). “mlr3: A Modern Object-Oriented Machine Learning Framework in R.” Journal of Open Source Software, 4(44), 1903. doi: 10.21105/joss.01903.

Examples

if(require("ranger")){
## generate data
set.seed(1)
n  <- 150                        # number of observations
p  <- 5                          # number of covariates
Z  <- matrix(runif(n*p), n, p)   # design matrix
D  <- rbinom(n, 1, 0.5)          # random treatment assignment
Y  <- runif(n)                   # outcome variable
propensity_scores <- rep(0.5, n) # propensity scores
M_set <- sample(1:n, size = n/2) # main set

## specify learner
learner <- "mlr3::lrn('ranger', num.trees = 10)"

## run single GenericML iteration
GenericML_single(Z, D, Y, learner, propensity_scores, M_set)
}

GenericML documentation built on June 18, 2022, 9:09 a.m.