budgetIV_scalar: Efficient partial identification of a scalar causal effect...

View source: R/budgetIV_scalar.R

budgetIV_scalarR Documentation

Efficient partial identification of a scalar causal effect parameter with invalid instruments

Description

Partial identification and coverage of a causal effect parameter using summary statistics and budget constraint assumptions. See Penn et al. (2025) for technical definitions.

Usage

budgetIV_scalar(
  beta_y,
  beta_phi,
  tau_vec = NULL,
  b_vec = NULL,
  delta_beta_y = NULL,
  bounds_only = TRUE
)

Arguments

beta_y

A d_{Z}-dimensional vector representing the (estimated) cross covariance \mathrm{Cov}(Y, Z).

beta_phi

A d_{Z}-dimensional vector representing the (estimated) cross covariance \mathrm{Cov}(\Phi (X), Z).

tau_vec

A K-dimensional vector of increasing, positive thresholds representing degrees of IV invalidity. The default value NULL can be used for a single threshold at 0.

b_vec

A K-dimensional vector of increasing positive integers representing the maximum number of IVs that can surpass each threshold. The default value NULL can be used for a single threshold at 0, with at least 50\% of IVs assumed to be valid.

delta_beta_y

A d_{Z}-dimensional vector of positive half-widths for box-shaped confidence bounds on beta_y. The default value NULL can be used to not include finite sample uncertainty.

bounds_only

A boolean TRUE or FALSE. TRUE will store overlapping intervals in the confidence set as a single interval, while FALSE will store different intervals for different values of budget_assignment (see return value of Penn et al. (2025) for further details). The default is TRUE.

If TRUE (default), the output consists only of disjoint bounds. Otherwise, if FALSE, the output consists of bounds for possibly touching intervals (but never overlapping), as well as the budget assignment corresponding to each bound.

Details

Instrumental variables are defined by three structural assumptions: (A1) they are associated with the treatment; (A2) they are unconfounded with the outcome; and (A3) they exclusively effect the outcome through the treatment. Assumption (A1) has a simple statistical test, whereas for many data generating processes (A2) and (A3) are unprovably false. The budgetIV and budgetIV_scalar algorithms allow for valid causal inference when some proportion, possibly a small minority, of candidate instruments satisfy both (A2) and (A3).

budgetIV & budgetIV_scalar assume a homogeneous treatment effect, which implies the separable structural equation Y = \theta \Phi(X) + g_y(Z, \epsilon_x). The difference between the algorithms is that budgetIV_scalar assumes \Phi(X) and \theta take scalar values, which is exploited for super-exponential computational speedup and allows for causal inference with thousands of candidate instruments Z. Both methods assume ground truth knowledge of the functional form of \Phi (X), e.g., a linear, logistic, Cox hazard, principal component based or other model. The parameter \theta captures the unknown treatment effect. Violation of (A2) and/or (A3) will bias classical IV approaches through the statistical dependence between Z and g_y(Z, \epsilon_x), summarized by the covariance parameter \gamma := \mathrm{Cov} (g_y(Z, \epsilon_x), Z).

budgetIV & budgetIV_scalar constrain \gamma through a series of positive thresholds 0 \leq \tau_1 < \tau_2 < \ldots < \tau_K and corresponding integer budgets 0 < b_1 < b_2 < \ldots < b_K \leq d_Z. It is assumed for each i \in \{ 1, \ldots, K\} that no more than b_i components of \gamma are greater in magnitude than \tau_i. For instance, taking d_Z = 100, K = 1, b_1 = 5 and \tau_1 = 0 means assuming 5 of the 100 candidates are valid instrumental variables (in the sense that their ratio estimates \theta_j := \mathrm{Cov}(Y, Z_j)/\mathrm{Cov}(\Phi(X), Z_j) are unbiased).

With delta_beta_y = NA, budgetIV & budgetIV_scalar return the identified set of causal effects that agree with both the budget constraints described above and the values of \mathrm{Cov}(Y, Z) and \mathrm{Cov}(Y, Z), assumed to be exactly precise. Unlike classical partial identification methods (see Manski (1990) ofr a canonical example), the non-convex mixed-integer budget constraints yield a possibly disconnected identified set. Each connected subset has a different interpretation as to which of the candidate instruments Z are valid up to each threshold. budgetIV_scalar returns these interpretations alongside the corresponding bounds on \theta.

When delta_beta_y is not null, it is used as box-constraints to quantify uncertainty in beta_y. In the examples, delta_beta_y is calculated through a Bonferroni correction and gives an (asymptotically) valid confidence set over beta_y. Under the so-called "no measurement error" (NOME) assumption (see Bowden et al. (2016)) which is commonly applied in Mendelian randomisation, it is assumed that the estimate of beta_y is the dominant source of finite-sample uncertainty, with uncertainty in beta_x entirely negligible. With an (asymptotically) valid confidence set over delta_beta_y and under the "no measurement error" assumption, budgetIV_scalar returns an (asymptotically) valid confidence set for \theta.

Value

A data.table with each row corresponding to bounds on the scalar causal effect parameter \theta corresponding to a particular budget assignment U (see Penn et al. (2025)). The return table has the following rows: a logical is_point determining whether the upper and lower bounds are equivalent; numerical lower_bound and upper_bound giving the lower and upper bounds; and a list budget_assignment giving the value of U for each candidate instrument. budget_assignment will only be returned if bounds_only == FALSE as input by the user.

A list of two entries: intervals, which is a two-column matrix with rows corresponding to disjoint bounds containing plausible values of \theta; and points, which is a one-column matrix consisting of lone plausible values of \theta—relevant when using \tau_1 = 0.

References

Jordan Penn, Lee Gunderson, Gecia Bravo-Hermsdorff, Ricardo Silva, and David Watson. (2024). BudgetIV: Optimal Partial Identification of Causal Effects with Mostly Invalid Instruments. arXiv preprint, 2411.06913.

Jack Bowden, Fabiola Del Greco M, Cosetta Minelli, George Davey Smith, Nuala A Sheehan, and John R Thompson. (2016). Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I^2 statistic. Int. J. Epidemiol. 46.6, pp. 1985–1998.

Charles F Manski. (1990). Nonparametric bounds on treatment effects. Am. Econ. Rev. 80.2, pp. 219–323.

Examples

 
data(Do_et_al_summary_statistics)

candidatesHDL = Do_et_al_summary_statistics[Do_et_al_summary_statistics$pHDL <= 1e-8, ]

candidate_labels <- candidatesHDL$rsID
d_Z <- length(candidate_labels)

beta_x <- candidatesHDL$betaHDL

beta_y <- candidatesHDL$betaCAD

SE_beta_y <- abs(beta_y) / qnorm(1-candidatesHDL$pCAD/2)

alpha = 0.05
delta_beta_y <- qnorm(1 - alpha/(2*d_Z))*SE_beta_y

feasible_region <- budgetIV_scalar(
                                   beta_y = candidatesHDL$betaCAD,
                                   beta_phi = beta_x,
                                   tau_vec = c(0),
                                   b_vec = c(30),
                                   delta_beta_y = delta_beta_y,
                                   bounds_only = FALSE
                                   )


budgetIVr documentation built on April 16, 2025, 5:11 p.m.