GenericML | R Documentation |
Performs generic machine learning inference on heterogeneous treatment effects as in Chernozhukov, Demirer, Duflo and Fernández-Val (2020) with user-specified machine learning methods. Intended for randomized experiments.
GenericML( Z, D, Y, learners_GenericML, learner_propensity_score = "constant", num_splits = 100, Z_CLAN = NULL, HT = FALSE, quantile_cutoffs = c(0.25, 0.5, 0.75), X1_BLP = setup_X1(), X1_GATES = setup_X1(), diff_GATES = setup_diff(), diff_CLAN = setup_diff(), vcov_BLP = setup_vcov(), vcov_GATES = setup_vcov(), equal_variances_CLAN = FALSE, prop_aux = 0.5, stratify = setup_stratify(), significance_level = 0.05, min_variation = 1e-05, parallel = FALSE, num_cores = parallel::detectCores(), seed = NULL, store_learners = FALSE, store_splits = TRUE )
Z |
A numeric design matrix that holds the covariates in its columns. |
D |
A binary vector of treatment assignment. Value one denotes assignment to the treatment group and value zero assignment to the control group. |
Y |
A numeric vector containing the response variable. |
learners_GenericML |
A character vector specifying the machine learners to be used for estimating the baseline conditional average (BCA) and conditional average treatment effect (CATE). Either |
learner_propensity_score |
The estimator of the propensity scores. Either a numeric vector (which is then taken as estimates of the propensity scores) or a string specifying the estimator. In the latter case, the string must either be equal to |
num_splits |
Number of sample splits. Default is 100. Must be larger than one. If you want to run |
Z_CLAN |
A numeric matrix holding variables on which classification analysis (CLAN) shall be performed. CLAN will be performed on each column of the matrix. If |
HT |
Logical. If |
quantile_cutoffs |
The cutoff points of the quantiles that shall be used for GATES grouping. Default is |
X1_BLP |
Specifies the design matrix X_1 in the regression. Must be an object of class |
X1_GATES |
Same as |
diff_GATES |
Specifies the generic targets of GATES. Must be an object of class |
diff_CLAN |
Same as |
vcov_BLP |
Specifies the covariance matrix estimator in the BLP regression. Must be an object of class |
vcov_GATES |
Same as |
equal_variances_CLAN |
Logical. If |
prop_aux |
Proportion of samples that shall be in the auxiliary set in case of random sample splitting. Default is 0.5. The number of samples in the auxiliary set will be equal to |
stratify |
A list that specifies whether or not stratified sample splitting shall be performed. It is recommended to use the returned object of |
significance_level |
Significance level for VEIN. Default is 0.05. |
min_variation |
Specifies a threshold for the minimum variation of the BCA/CATE predictions. If the variation of a BCA/CATE prediction falls below this threshold, random noise with distribution N(0, var(Y)/20) is added to it. Default is |
parallel |
Logical. If |
num_cores |
Number of cores to be used in parallelization (if applicable). Default is the number of cores of the user's machine. |
seed |
Random seed. Default is |
store_learners |
Logical. If |
store_splits |
Logical. If |
The specifications "lasso"
, "random_forest"
, and "tree"
in learners_GenericML
and learner_propensity_score
correspond to the following mlr3
specifications (we omit the keywords classif.
and regr.
). "lasso"
is a cross-validated Lasso estimator, which corresponds to 'mlr3::lrn("cv_glmnet", s = "lambda.min", alpha = 1)'
. "random_forest"
is a random forest with 500 trees, which corresponds to 'mlr3::lrn("ranger", num.trees = 500)'
. "tree"
is a tree learner, which corresponds to 'mlr3::lrn("rpart")'
. Warning: GenericML()
can be quite memory-intensive, in particular when the data set is large. To alleviate memory usage, consider setting store_learners = FALSE
, choosing a low number of cores via num_cores
(at the expense of longer computing time), setting prop_aux
to a value smaller than the default of 0.5, or using GenericML_combine()
.
An object of class "GenericML"
. On this object, we recommend to use the accessor functions get_BLP()
, get_GATES()
, and get_CLAN()
to extract the results of the analyses of BLP, GATES, and CLAN, respectively. An object of class "GenericML"
contains the following components:
VEIN
A list containing two sub-lists called best_learners
and all_learners
, respectively. Each of these two sub-lists contains the inferential VEIN results on the generic targets of the BLP, GATES, and CLAN analyses. all_learners
does this for all learners specified in the argument learners_GenericML
, best_learners
only for the corresponding best learners. Which learner is best for which analysis is assessed by the Λ criteria discussed in Sections 5.2 and 5.3 of the paper.
best
A list containing information on the evaluation of which learner is the best for which analysis. Contains four components. The first three contain the name of the best learner for BLP, GATES, and CLAN, respectively. The fourth component, overview
, contains the two Λ criteria used to determine the best learners (discussed in Sections 5.2 and 5.3 of the paper).
propensity_scores
The propensity score estimates as well as the "mlr3"
objects used to estimate them (if mlr3
was used for estimation).
GenericML_single
Only nonempty if store_learners = TRUE
. Contains all intermediate results of each learners for each split. That is, for a given learner (first level of the list) and split (second level), objects of classes "BLP"
, "GATES"
, "CLAN"
, "proxy_BCA"
, "proxy_CATE"
as well as the Λ criteria ("best"
)) are listed, which were computed with the given learner and split.
splits
Only nonempty if store_splits = TRUE
. Contains a character matrix of dimension length(Y)
by num_splits
. Contains the group membership (main or auxiliary) of each observation (rows) in each split (columns). "M"
denotes the main set, "A"
the auxiliary set.
generic_targets
A list of generic target estimates for each learner. More specifically, each component is a list of the generic target estimates pertaining to the BLP, GATES, and CLAN analyses. Each of those lists contains a three-dimensional array containing the generic targets of a single learner for all sample splits (except CLAN where there is one more layer of lists).
arguments
A list of arguments used in the function call.
In an earlier development version, Lucas Kitzmueller alerted us to several minor bugs and proposed fixes. Many thanks to him!
Chernozhukov V., Demirer M., Duflo E., Fernández-Val I. (2020). “Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments.” arXiv preprint arXiv:1712.04802. URL: https://arxiv.org/abs/1712.04802.
Lang M., Binder M., Richter J., Schratz P., Pfisterer F., Coors S., Au Q., Casalicchio G., Kotthoff L., Bischl B. (2019). “mlr3: A Modern Object-Oriented Machine Learning Framework in R.” Journal of Open Source Software, 4(44), 1903. doi: 10.21105/joss.01903.
plot.GenericML()
print.GenericML()
get_BLP()
,
get_GATES()
,
get_CLAN()
,
setup_X1()
,
setup_diff()
,
setup_vcov()
,
setup_stratify()
,
GenericML_single()
,
GenericML_combine()
if (require("glmnet") && require("ranger")) { ## generate data set.seed(1) n <- 150 # number of observations p <- 5 # number of covariates D <- rbinom(n, 1, 0.5) # random treatment assignment Z <- matrix(runif(n*p), n, p) # design matrix Y0 <- as.numeric(Z %*% rexp(p) + rnorm(n)) # potential outcome without treatment Y1 <- 2 + Y0 # potential outcome under treatment Y <- ifelse(D == 1, Y1, Y0) # observed outcome ## column names of Z colnames(Z) <- paste0("V", 1:p) ## specify learners learners <- c("lasso", "mlr3::lrn('ranger', num.trees = 10)") ## glmnet v4.1.3 isn't supported on Solaris, so skip Lasso in this case if(Sys.info()["sysname"] == "SunOS") learners <- learners[-1] ## specify quantile cutoffs (the 4 quartile groups here) quantile_cutoffs <- c(0.25, 0.5, 0.75) ## specify the differenced generic targets of GATES and CLAN # use G4-G1, G4-G2, G4-G3 as differenced generic targets in GATES diff_GATES <- setup_diff(subtract_from = "most", subtracted = c(1,2,3)) # use G1-G3, G1-G2 as differenced generic targets in CLAN diff_CLAN <- setup_diff(subtract_from = "least", subtracted = c(3,2)) ## perform generic ML inference # small number of splits to keep computation time low x <- GenericML(Z, D, Y, learners, num_splits = 2, quantile_cutoffs = quantile_cutoffs, diff_GATES = diff_GATES, diff_CLAN = diff_CLAN, parallel = FALSE) ## access BLP generic targets for best learner and make plot get_BLP(x, plot = TRUE) ## access GATES generic targets for best learner and make plot get_GATES(x, plot = TRUE) ## access CLAN generic targets for "V1" & best learner and make plot get_CLAN(x, variable = "V1", plot = TRUE) }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.