get_bound: Estimation of bounds on ATE as a function of the proportion...

Description Usage Arguments Value Details References Examples

View source: R/get_bound.R

Description

get_bound is the main function to estimate the lower and upper bounds curves as a function of eps, the proportion of unmeasured confounding.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
get_bound(
  y,
  a,
  x,
  ymin,
  ymax,
  outfam,
  treatfam,
  sl.lib,
  model = "x",
  eps = 0,
  delta = 0,
  nsplits = 5,
  do_mult_boot = TRUE,
  do_eps_zero = TRUE,
  alpha = 0.05,
  B = 10000,
  nuis_fns = NULL,
  plugin = FALSE,
  do_rearrange = FALSE,
  do_parallel = FALSE,
  ncluster = NULL,
  show_progress = FALSE
)

Arguments

y

nx1 outcome vector in [0, 1].

a

nx1 treatment received vector.

x

nxp data.frame of covariates.

ymin

infimum of the support of y.

ymax

supremum of the support of y.

outfam

family specifying the error distribution for outcome regression, currently gaussian() or binomial() supported. Link should not be specified.

treatfam

family specifying the error distribution for treatment regression, currently binomial() supported. Link should not be specified.

sl.lib

character vector specifying which libraries to use for the SL.

model

a string specifying the assumption placed on S when computing the bounds. Currently only "x" (S \perp (Y, A) | X) and "xa" (S \perp Y | A, X). Default is model = "x".

eps

vector of arbitrary length specifying the values for the proportion of confounding where the lower and upper bounds curves are evaluated. Default is 0 (no unmeasured confounding).

delta

vector of arbitrary length specifyin the values of delta used to bound maximal confounding among S = 0 units. Default is delta = 1, which imposes no assumption if the outcome Y is bounded.

nsplits

number of splits for the cross-fitting. Default is 5.

do_mult_boot

boolean for whether uniform bands via the multiplier bootstrap need to computed. Default is do_mult_boot=TRUE.

do_eps_zero

boolean for whether estimate of espilon_zero shoul be computed. Default is do_eps_zero=TRUE.

alpha

confidence level. Default is 0.05.

B

number of rademacher rvs sampled. Default is 10000.

nuis_fns

Optional. A nx4 matrix specifying the estimated regression functions evaluated at the observed x, columns should be named: pi0, pi1, mu1, mu0. Default is NULL so that regressions are estimated using the SuperLearner via cross-fitting.

plugin

boolean for whether the estimator for the bounds is of plug-in type: uses g(etahat) rather than its estimator based on influence functions when computing term that multiplies the indicator. So g(etahat) instead of tauhat.

do_rearrange

bollean for whether the precedure by Chernozhukov et al (2008) should be applied to the estimators of the bounds and the CIs.

do_parallel

boolean for whether parallel computing should be used

ncluster

number of clusters used if parallel computing is used.

show_progress

boolean for whether progress bar in estimating regression functions should be shown. Default is FALSE. Currently, only available if do_parallel is FALSE.

Value

A list containing

bounds

a length(eps)x12xlength(delta) array, where, for each eps and delta, it has the estimates of lower bound (lb), upper bound (ub), lower uniform band for lower bound (ci_lb_lo_unif), upper uniform band for upper bound (ci_ub_hi_unif), lower pointwise band for lower bound (ci_lb_lo_pt), upper pointwise band for lower bound (ci_lb_hi_pt), lower pointwise band for upper bound (ci_ub_lo_pt), upper pointwise band for upper bound (ci_ub_hi_pt), lower confidence band using Imbens & Manski (2004) procedure (ci_im04_lo), upper confidence band using Imbens & Manski (2004) procedure(ci_im04_hi).

var_ub

estimate of the variance of the upper bound curve as fn of eps.

var_lb

estimate of the variance of the lower bound curve as fn of eps.

eps_zero

a length(delta)x5 data.frame with values of delta, estimate of eps0, max(0, ci_lo), min(1, ci_hi), variance of estimate of eps0.

q_lb

a list of size nsplits, where element j is a (num eps) x (num delta) matrix containing estimates of eps-quantile of g(etab) for lower bound, where g(etab) is computed without using obs from fold j

.

q_ub

a list of size nsplits, where element j is a (num eps) x (num delta) matrix containing estimates of eps-quantile of g(etab) for upper bound, where g(etab) is computed without using obs from fold j

.

lambda_lb

a list of size nsplits, where element j contains a nxlength(eps)xlength(delta) array containing the indicator ghatmat <= q, where q is eps-quantile of ghatmat and ghatmat is g(eta) for lower bound, computed using regression functions estimated using all folds but j and evaluated using obs at fold j.

lambda_ub

a list of size nsplits, where element j contains a nxlength(eps)xlength(delta) array containing the indicator ghatmat > q, where q is (1-eps)-quantile of ghatmat and ghatmat is g(eta) for upper bound, computed using regression functions estimated using all folds but j and evaluated using obs at fold j.

ifvals_lb

a n x length(eps) x length(delta) array containing the influence functions for lower bound evaluated at the observed X as a function of epsilon and delta.

ifvals_ub

a n x length(eps) x length(delta) array containing the influence functions for upper bound evaluated at the observed X as a function of epsilon and delta.

nuis_fns

a list containing estimates of regression functions evaluated at test obs and train obs. See do_crossfit.

nuhat

a nx1 matrix containing the influence function values for EE(Y|A = 1, X) - E(Y|A = 0, X) computed using regressions fns estimated using all obs except those in fold j and evaluated at obs in fold j.

glhat

a list of size nsplits, where element j is (num obs in split j) x length(delta) matrix containing estimates of g(eta) for lower bound computed using regressions fns estimated using all obs except those in fold j and evaluated at obs in fold j.

guhat

a list of size nsplits, where element j is (num obs in split j) x length(delta) matrix containing estimates of g(eta) for upper bound computed using regressions fns estimated using all obs except those in fold j and evaluated at obs in fold j.

glhat_train

a list of size nsplits, where element j is (n - num obs in split j) x length(delta) matrix containing estimates of g(eta) for lower bound computed using regressions fns estimated using all obs except those in fold j and evaluated at obs not in fold j.

guhat_train

a list of size nsplits, where element j is (n - num obs in split j) x length(delta) matrix containing estimates of g(eta) for upper bound computed using regressions fns estimated using all obs except those in fold j and evaluated at obs not in fold j.

tauhat_lb

a list of size nsplits, where element j is (num obs in split j) x 1 matrix containing the influence function values of the parameter E(g(eta)) with g(eta) for the lower bound computed using regression fns estimated from all obs except those in fold j and evaluated at obs in fold j.

tauhat_ub

a list of size nsplits, where element j is (num obs in split j) x 1 matrix containing the influence function values of the parameter E(g(eta)) with g(eta) for the upper bound computed using regression fns estimated from all obs except those in fold j and evaluated at obs in fold j.

phibar_lb

a n x length(eps) x length(delta) array containing values for ifvals_lb - lambda_lb * q_lb.

phibar_ub

a n x length(eps) x length(delta) array containing values for ifvals_ub - lambda_ub * q_ub.

phibar_lb_fold

a list of size nsplits, where element j is n x length(eps) x length(delta) array containing values for ifvals_lb - lambda_lb * q_lb computed using regression functions estimated from all obs except those in fold j and evaluated at obs in fold j.

phibar_ub_fold

a list of size nsplits, where element j is n x length(eps) x length(delta) array containing values for ifvals_ub - lambda_ub * q_ub computed using regression functions estimated from all obs except those in fold j and evaluated at obs in fold j.

mult_calpha_lb

a ndelta-dimensional vector containing calpha equal to the z-score used to construct uniform bands for the lower bound of the form psi(eps) \pm calpha * sigma(eps).

mult_calpha_ub

a ndelta-dimensional vector containing calpha equal to the z-score used to construct uniform bands for the upper bound of the form psi(eps) \pm calpha * sigma(eps).

im04_calpha

a ndelta-dimensional vector containing to the z-score used to construct the confidence interval for partially identified ATE as in Imbens & Manski (2004).

Details

As done in the paper, one can see that g(eta) for the lower bound is equal to g(eta) for the upper bound minus delta * (ymax - ymin). Therefore the IFs for E(g(eta)) follows the same relation. They are keep separated just for code clarity.

References

Imbens, G. W., & Manski, C. F. (2004). Confidence intervals for partially identified parameters. Econometrica, 72(6), 1845-1857.

Van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. Statistical applications in genetics and molecular biology, 6(1).

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., & Newey, W. K. (2016). Double machine learning for treatment and causal parameters (No. CWP49/16). cemmap working paper.

Kennedy, E. H. (2019). Nonparametric causal effects based on incremental propensity score interventions. Journal of the American Statistical Association, 114(526), 645-656.

Chernozhukov, V., Fernandez-Val, I., & Galichon, A. (2009). Improving point and interval estimators of monotone functions by rearrangement. Biometrika, 96(3), 559-575.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
n <- 1000
eps <- seq(0, 1, 0.001)
delta <- c(0.25, 0.5, 1)
a <- rbinom(n, 1, 0.5)
x <- as.data.frame(matrix(rnorm(2*n), ncol = 2, nrow = n))
ymin <- 0
ymax <- 1
y <- runif(n, ymin, ymax)
res <- get_bound(y = y, a = a, x = x, ymin = ymin, ymax = ymax, 
                 outfam = gaussian(),  treatfam = binomial(), 
                 model = "x", eps = eps, delta = delta, 
                 do_mult_boot = TRUE, do_eps_zero = TRUE, nsplits = 5, 
                 alpha = 0.05, B = 1000, sl.lib = "SL.glm")
print(res$eps_zero)
print(head(res$bounds[, , 1]))

matteobonvini/sensitivitypuc documentation built on Dec. 9, 2020, 2:24 a.m.