Description Usage Arguments Value Details References Examples
get_bound
is the main function to estimate the lower and
upper bounds curves as a function of eps, the proportion of
unmeasured confounding.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
y |
nx1 outcome vector in [0, 1]. |
a |
nx1 treatment received vector. |
x |
nxp |
ymin |
infimum of the support of y. |
ymax |
supremum of the support of y. |
outfam |
family specifying the error distribution for outcome
regression, currently |
treatfam |
family specifying the error distribution for treatment
regression, currently |
sl.lib |
character vector specifying which libraries to use for the SL. |
model |
a string specifying the assumption placed on S when computing the bounds. Currently only "x" (S \perp (Y, A) | X) and "xa" (S \perp Y | A, X). Default is model = "x". |
eps |
vector of arbitrary length specifying the values for the proportion of confounding where the lower and upper bounds curves are evaluated. Default is 0 (no unmeasured confounding). |
delta |
vector of arbitrary length specifyin the values of delta used to bound maximal confounding among S = 0 units. Default is delta = 1, which imposes no assumption if the outcome Y is bounded. |
nsplits |
number of splits for the cross-fitting. Default is 5. |
do_mult_boot |
boolean for whether uniform bands via the multiplier bootstrap need to computed. Default is do_mult_boot=TRUE. |
do_eps_zero |
boolean for whether estimate of espilon_zero shoul be computed. Default is do_eps_zero=TRUE. |
alpha |
confidence level. Default is 0.05. |
B |
number of rademacher rvs sampled. Default is 10000. |
nuis_fns |
Optional. A nx4 matrix specifying the estimated regression functions evaluated at the observed x, columns should be named: pi0, pi1, mu1, mu0. Default is NULL so that regressions are estimated using the SuperLearner via cross-fitting. |
plugin |
boolean for whether the estimator for the bounds is of plug-in type: uses g(etahat) rather than its estimator based on influence functions when computing term that multiplies the indicator. So g(etahat) instead of tauhat. |
do_rearrange |
bollean for whether the precedure by Chernozhukov et al (2008) should be applied to the estimators of the bounds and the CIs. |
do_parallel |
boolean for whether parallel computing should be used |
ncluster |
number of clusters used if parallel computing is used. |
show_progress |
boolean for whether progress bar in estimating regression functions should be shown. Default is FALSE. Currently, only available if do_parallel is FALSE. |
A list containing
|
a length(eps)x12xlength(delta) array, where, for each
eps and delta, it has the estimates of lower bound ( |
|
estimate of the variance of the upper bound curve as fn of eps. |
|
estimate of the variance of the lower bound curve as fn of eps. |
|
a length(delta)x5 |
|
a list of size |
.
|
a list of size |
.
|
a list of size |
|
a list of size |
|
a n x length(eps) x length(delta) array containing the influence functions for lower bound evaluated at the observed X as a function of epsilon and delta. |
|
a n x length(eps) x length(delta) array containing the influence functions for upper bound evaluated at the observed X as a function of epsilon and delta. |
|
a list containing estimates of regression functions
evaluated at test obs and train obs. See |
|
a nx1 matrix containing the influence function values for EE(Y|A = 1, X) - E(Y|A = 0, X) computed using regressions fns estimated using all obs except those in fold j and evaluated at obs in fold j. |
|
a list of size |
|
a list of size |
|
a list of size |
|
a list of size |
|
a list of size |
|
a list of size |
|
a n x length(eps) x length(delta) array containing
values for |
|
a n x length(eps) x length(delta) array containing
values for |
|
a list of size |
|
a list of size |
|
a ndelta-dimensional vector containing
|
|
a ndelta-dimensional vector containing
|
|
a ndelta-dimensional vector containing to the z-score used to construct the confidence interval for partially identified ATE as in Imbens & Manski (2004). |
As done in the paper, one can see that g(eta) for the lower bound is equal to g(eta) for the upper bound minus delta * (ymax - ymin). Therefore the IFs for E(g(eta)) follows the same relation. They are keep separated just for code clarity.
Imbens, G. W., & Manski, C. F. (2004). Confidence intervals for partially identified parameters. Econometrica, 72(6), 1845-1857.
Van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. Statistical applications in genetics and molecular biology, 6(1).
Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., & Newey, W. K. (2016). Double machine learning for treatment and causal parameters (No. CWP49/16). cemmap working paper.
Kennedy, E. H. (2019). Nonparametric causal effects based on incremental propensity score interventions. Journal of the American Statistical Association, 114(526), 645-656.
Chernozhukov, V., Fernandez-Val, I., & Galichon, A. (2009). Improving point and interval estimators of monotone functions by rearrangement. Biometrika, 96(3), 559-575.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | n <- 1000
eps <- seq(0, 1, 0.001)
delta <- c(0.25, 0.5, 1)
a <- rbinom(n, 1, 0.5)
x <- as.data.frame(matrix(rnorm(2*n), ncol = 2, nrow = n))
ymin <- 0
ymax <- 1
y <- runif(n, ymin, ymax)
res <- get_bound(y = y, a = a, x = x, ymin = ymin, ymax = ymax,
outfam = gaussian(), treatfam = binomial(),
model = "x", eps = eps, delta = delta,
do_mult_boot = TRUE, do_eps_zero = TRUE, nsplits = 5,
alpha = 0.05, B = 1000, sl.lib = "SL.glm")
print(res$eps_zero)
print(head(res$bounds[, , 1]))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.