get_bound: Estimation of bounds on ATE as a function of the proportion...
In matteobonvini/sensitivitypuc: Sensitivity Analysis via the Proportion of Unmeasured Confounding (PUC)

Description Usage Arguments Value Details References Examples

get_bound is the main function to estimate the lower and upper bounds curves as a function of eps, the proportion of unmeasured confounding.

get_bound(
  y,
  a,
  x,
  ymin,
  ymax,
  outfam,
  treatfam,
  sl.lib,
  model = "x",
  eps = 0,
  delta = 0,
  nsplits = 5,
  do_mult_boot = TRUE,
  do_eps_zero = TRUE,
  alpha = 0.05,
  B = 10000,
  nuis_fns = NULL,
  plugin = FALSE,
  do_rearrange = FALSE,
  do_parallel = FALSE,
  ncluster = NULL,
  show_progress = FALSE
)

`y`	nx1 outcome vector in [0, 1].
`a`	nx1 treatment received vector.
`x`	nxp `data.frame` of covariates.
`ymin`	infimum of the support of y.
`ymax`	supremum of the support of y.
`outfam`	family specifying the error distribution for outcome regression, currently `gaussian()` or `binomial()` supported. Link should not be specified.
`treatfam`	family specifying the error distribution for treatment regression, currently `binomial()` supported. Link should not be specified.
`sl.lib`	character vector specifying which libraries to use for the SL.
`model`	a string specifying the assumption placed on S when computing the bounds. Currently only "x" (S \perp (Y, A) \| X) and "xa" (S \perp Y \| A, X). Default is model = "x".
`eps`	vector of arbitrary length specifying the values for the proportion of confounding where the lower and upper bounds curves are evaluated. Default is 0 (no unmeasured confounding).
`delta`	vector of arbitrary length specifyin the values of delta used to bound maximal confounding among S = 0 units. Default is delta = 1, which imposes no assumption if the outcome Y is bounded.
`nsplits`	number of splits for the cross-fitting. Default is 5.
`do_mult_boot`	boolean for whether uniform bands via the multiplier bootstrap need to computed. Default is do_mult_boot=TRUE.
`do_eps_zero`	boolean for whether estimate of espilon_zero shoul be computed. Default is do_eps_zero=TRUE.
`alpha`	confidence level. Default is 0.05.
`B`	number of rademacher rvs sampled. Default is 10000.
`nuis_fns`	Optional. A nx4 matrix specifying the estimated regression functions evaluated at the observed x, columns should be named: pi0, pi1, mu1, mu0. Default is NULL so that regressions are estimated using the SuperLearner via cross-fitting.
`plugin`	boolean for whether the estimator for the bounds is of plug-in type: uses g(etahat) rather than its estimator based on influence functions when computing term that multiplies the indicator. So g(etahat) instead of tauhat.
`do_rearrange`	bollean for whether the precedure by Chernozhukov et al (2008) should be applied to the estimators of the bounds and the CIs.
`do_parallel`	boolean for whether parallel computing should be used
`ncluster`	number of clusters used if parallel computing is used.
`show_progress`	boolean for whether progress bar in estimating regression functions should be shown. Default is FALSE. Currently, only available if do_parallel is FALSE.

A list containing

`bounds`	a length(eps)x12xlength(delta) array, where, for each eps and delta, it has the estimates of lower bound (`lb`), upper bound (`ub`), lower uniform band for lower bound (`ci_lb_lo_unif`), upper uniform band for upper bound (`ci_ub_hi_unif`), lower pointwise band for lower bound (`ci_lb_lo_pt`), upper pointwise band for lower bound (`ci_lb_hi_pt`), lower pointwise band for upper bound (`ci_ub_lo_pt`), upper pointwise band for upper bound (`ci_ub_hi_pt`), lower confidence band using Imbens & Manski (2004) procedure (`ci_im04_lo`), upper confidence band using Imbens & Manski (2004) procedure(`ci_im04_hi`).
`var_ub`	estimate of the variance of the upper bound curve as fn of eps.
`var_lb`	estimate of the variance of the lower bound curve as fn of eps.
`eps_zero`	a length(delta)x5 `data.frame` with values of delta, estimate of eps0, max(0, ci_lo), min(1, ci_hi), variance of estimate of eps0.
`q_lb`	a list of size `nsplits`, where element j is a (num eps) x (num delta) matrix containing estimates of eps-quantile of g(etab) for lower bound, where g(etab) is computed without using obs from fold j

q_ub

a list of size nsplits, where element j is a (num eps) x (num delta) matrix containing estimates of eps-quantile of g(etab) for upper bound, where g(etab) is computed without using obs from fold j

`lambda_lb`	a list of size `nsplits`, where element j contains a nxlength(eps)xlength(delta) array containing the indicator ghatmat <= q, where q is eps-quantile of ghatmat and ghatmat is g(eta) for lower bound, computed using regression functions estimated using all folds but j and evaluated using obs at fold j.
`lambda_ub`	a list of size `nsplits`, where element j contains a nxlength(eps)xlength(delta) array containing the indicator ghatmat > q, where q is (1-eps)-quantile of ghatmat and ghatmat is g(eta) for upper bound, computed using regression functions estimated using all folds but j and evaluated using obs at fold j.
`ifvals_lb`	a n x length(eps) x length(delta) array containing the influence functions for lower bound evaluated at the observed X as a function of epsilon and delta.
`ifvals_ub`	a n x length(eps) x length(delta) array containing the influence functions for upper bound evaluated at the observed X as a function of epsilon and delta.
`nuis_fns`	a list containing estimates of regression functions evaluated at test obs and train obs. See `do_crossfit`.
`nuhat`	a nx1 matrix containing the influence function values for EE(Y\|A = 1, X) - E(Y\|A = 0, X) computed using regressions fns estimated using all obs except those in fold j and evaluated at obs in fold j.
`glhat`	a list of size `nsplits`, where element j is (num obs in split j) x length(delta) matrix containing estimates of g(eta) for lower bound computed using regressions fns estimated using all obs except those in fold j and evaluated at obs in fold j.
`guhat`	a list of size `nsplits`, where element j is (num obs in split j) x length(delta) matrix containing estimates of g(eta) for upper bound computed using regressions fns estimated using all obs except those in fold j and evaluated at obs in fold j.
`glhat_train`	a list of size `nsplits`, where element j is (n - num obs in split j) x length(delta) matrix containing estimates of g(eta) for lower bound computed using regressions fns estimated using all obs except those in fold j and evaluated at obs not in fold j.
`guhat_train`	a list of size `nsplits`, where element j is (n - num obs in split j) x length(delta) matrix containing estimates of g(eta) for upper bound computed using regressions fns estimated using all obs except those in fold j and evaluated at obs not in fold j.
`tauhat_lb`	a list of size `nsplits`, where element j is (num obs in split j) x 1 matrix containing the influence function values of the parameter E(g(eta)) with g(eta) for the lower bound computed using regression fns estimated from all obs except those in fold j and evaluated at obs in fold j.
`tauhat_ub`	a list of size `nsplits`, where element j is (num obs in split j) x 1 matrix containing the influence function values of the parameter E(g(eta)) with g(eta) for the upper bound computed using regression fns estimated from all obs except those in fold j and evaluated at obs in fold j.
`phibar_lb`	a n x length(eps) x length(delta) array containing values for `ifvals_lb` - `lambda_lb` * `q_lb`.
`phibar_ub`	a n x length(eps) x length(delta) array containing values for `ifvals_ub` - `lambda_ub` * `q_ub`.
`phibar_lb_fold`	a list of size `nsplits`, where element j is n x length(eps) x length(delta) array containing values for `ifvals_lb` - `lambda_lb` * `q_lb` computed using regression functions estimated from all obs except those in fold j and evaluated at obs in fold j.
`phibar_ub_fold`	a list of size `nsplits`, where element j is n x length(eps) x length(delta) array containing values for `ifvals_ub` - `lambda_ub` * `q_ub` computed using regression functions estimated from all obs except those in fold j and evaluated at obs in fold j.
`mult_calpha_lb`	a ndelta-dimensional vector containing `calpha` equal to the z-score used to construct uniform bands for the lower bound of the form psi(eps) \pm `calpha` * sigma(eps).
`mult_calpha_ub`	a ndelta-dimensional vector containing `calpha` equal to the z-score used to construct uniform bands for the upper bound of the form psi(eps) \pm `calpha` * sigma(eps).
`im04_calpha`	a ndelta-dimensional vector containing to the z-score used to construct the confidence interval for partially identified ATE as in Imbens & Manski (2004).

As done in the paper, one can see that g(eta) for the lower bound is equal to g(eta) for the upper bound minus delta * (ymax - ymin). Therefore the IFs for E(g(eta)) follows the same relation. They are keep separated just for code clarity.

Imbens, G. W., & Manski, C. F. (2004). Confidence intervals for partially identified parameters. Econometrica, 72(6), 1845-1857.

Van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super learner. Statistical applications in genetics and molecular biology, 6(1).

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., & Newey, W. K. (2016). Double machine learning for treatment and causal parameters (No. CWP49/16). cemmap working paper.

Kennedy, E. H. (2019). Nonparametric causal effects based on incremental propensity score interventions. Journal of the American Statistical Association, 114(526), 645-656.

Chernozhukov, V., Fernandez-Val, I., & Galichon, A. (2009). Improving point and interval estimators of monotone functions by rearrangement. Biometrika, 96(3), 559-575.

n <- 1000
eps <- seq(0, 1, 0.001)
delta <- c(0.25, 0.5, 1)
a <- rbinom(n, 1, 0.5)
x <- as.data.frame(matrix(rnorm(2*n), ncol = 2, nrow = n))
ymin <- 0
ymax <- 1
y <- runif(n, ymin, ymax)
res <- get_bound(y = y, a = a, x = x, ymin = ymin, ymax = ymax, 
                 outfam = gaussian(),  treatfam = binomial(), 
                 model = "x", eps = eps, delta = delta, 
                 do_mult_boot = TRUE, do_eps_zero = TRUE, nsplits = 5, 
                 alpha = 0.05, B = 1000, sl.lib = "SL.glm")
print(res$eps_zero)
print(head(res$bounds[, , 1]))

matteobonvini/sensitivitypuc documentation built on Dec. 9, 2020, 2:24 a.m.

matteobonvini/sensitivitypuc index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

matteobonvini/sensitivitypuc
Sensitivity Analysis via the Proportion of Unmeasured Confounding (PUC)

get_bound: Estimation of bounds on ATE as a function of the proportion...
In matteobonvini/sensitivitypuc: Sensitivity Analysis via the Proportion of Unmeasured Confounding (PUC)

Description

Usage

Arguments

Value

Details

References

Examples

Related to get_bound in matteobonvini/sensitivitypuc...

R Package Documentation

Browse R Packages

We want your feedback!

matteobonvini/sensitivitypuc Sensitivity Analysis via the Proportion of Unmeasured Confounding (PUC)

get_bound: Estimation of bounds on ATE as a function of the proportion... In matteobonvini/sensitivitypuc: Sensitivity Analysis via the Proportion of Unmeasured Confounding (PUC)

Description

Usage

Arguments

Value

Details

References

Examples

Related to get_bound in matteobonvini/sensitivitypuc...

R Package Documentation

Browse R Packages

We want your feedback!

matteobonvini/sensitivitypuc
Sensitivity Analysis via the Proportion of Unmeasured Confounding (PUC)

get_bound: Estimation of bounds on ATE as a function of the proportion...
In matteobonvini/sensitivitypuc: Sensitivity Analysis via the Proportion of Unmeasured Confounding (PUC)