Description Usage Arguments Details Value References See Also Examples
Estimate heterogeneous treatment effects (HTEs) using the X-Learner strategy.
1 2 3 4 5 6 7 8 9 10 11 |
data |
a dataframe object containing the variables and values. |
x |
a list of character vectors specifying variables to be included in the model (columns in the data). If unspecified, then it is assumed to be all columns in the data besides y and w. |
y |
a character vector specifying the response variable. |
w |
a character vector specifying the treatment status. |
base_learner |
a character vector specifying the base learner. One of "regression forest" or "OLS". Default is "regression forest". |
cate_model |
a character vector specifying the model used to estimate the CATE. One of "regression forest" or "OLS". Default is "regression forest". |
propensity_model |
character string naming the model used to estimate propensity scores (one of "logistic", "lasso", or "causal forest"). |
plot |
logical; if TRUE, then plots a histogram of treatment effects. |
... |
additional arguments passed to the base learner. |
Implements the X-learner algorithm proposed in Künzel et al. (2019) for estimating conditional average treatment effects (CATE).
In the first stage of the X-learner, the control response function is estimated using all units in the control group as
μ_0 = E [ Y(0) | X = x],
and the treatment response function is estimates using all units in the treatment group as
μ_1 = E [ Y(1) | X = x].
Both μ_0 and μ_1 are estimated using any base learner (supervised machine learning or regression algorithm). Here we implement the X-learner with linear regression or regression forest (see Athey, Tibshirani, and Wager (2016)) as the base learner.
In the second stage, the treatment effect for each observation is then imputed by estimating the counterfactual outcome for each observation using the first-stage base learner models:
\tilde{D}^1_i := Y^1_i - \hat{μ}_0(X^1_i)
and
\tilde{D}^0_i := Y^0_i - \hat{μ}_0(X^0_i)
where \tilde{D}^1_i and \tilde{D}^1_i are the imputed treatment effects (two for each observation). The CATE is then estimated in two ways:
\hat{τ}_1 = E[\tilde{D}^1 | X = x]
and
\hat{τ}_0 = E[\tilde{D}^0 | X = x].
Currently, we include the option to estimate \hat{τ}_1 and \hat{τ}_0 with linear regression or regression forests.
In the third stage, estimate the CATE by a weighted average of the two estimates from the second stage:
\hat{τ} = g(x) \hat{τ}_0(x) + (1 - g(x)) \hat{τ}_1(x).
Here, we choose propensity scores to be the weighting function g{x}.
a list of two. The first element is a vector of conditional average treatment effect for each observation. The second element is the estimated average treatment effect.
Künzel, Sören R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. 2019. “Metalearners for estimating heterogeneous treatment effects using machine learning." Proceedings of the National Academy of Sciences of the United States of America. Mar. 116(10): 4156–4165. https://doi.org/10.1073/pnas.1804597116
Athey, Susan, Julie Tibshirani, and Stefan Wager. 2016. “Generalized Random Forests." Working paper; Forthcoming in the Annals of Statistics. https://arxiv.org/abs/1610.01271
1 2 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.