budgetIV | R Documentation |
Computes the set of possible values of a causal parameter consistent with observational data and given budget constraints. See Penn et al. (2025) for technical definitions.
budgetIV(
beta_y,
beta_phi,
phi_basis = NULL,
tau_vec = NULL,
b_vec = NULL,
ATE_search_domain = NULL,
X_baseline = NULL,
delta_beta_y = NULL
)
beta_y |
Either |
beta_phi |
A |
phi_basis |
A |
tau_vec |
A |
b_vec |
A |
ATE_search_domain |
A |
X_baseline |
Either a data.frame or list representing a baseline
treatment |
delta_beta_y |
A |
Instrumental variables are defined by three structural assumptions: (A1) they are associated with the treatment;
(A2) they are unconfounded with the outcome; and (A3) exclusively effect the
outcome through the treatment.
Of these, only (A1) can be tested without further assumptions.
The budgetIV
function allows for valid causal inference when some
proportion (possibly a small minority) of candidate instruments satisfy
both (A2) and (A3). Tuneable thresholds decided by the user also allow for
bounds on the degree of invalidity for each instrument (i.e., bounds on the
proportion of \mathrm{Cov}(Y, Z)
not explained by the causal effect of
X
on Z
). Full technical details are included in Penn et al. (2025).
budgetIV
assumes that treatment effects are homogeneous, which implies
a structural equation of the form Y = \theta \cdot \Phi(X) + g_y(Z, \epsilon_x)
,
where \theta
and \Phi(X)
are a d_{\Phi}
-dimensional vector
and vector-valued function respectively. A valid basis expansion \Phi (X)
is assumed (e.g., linear, logistic, polynomial, RBF, neural embedding, PCA, UMAP etc.).
It is also assumed that d_{\Phi} <= d_{Z}
, which allows us to
treat the basis functions as a complete linear model (see Theil (1953), or Sanderson et al. (2019)
for a modern MR focused discussion).
The parameters \theta
capture the unknown treatment effect.
Violation of (A2) and/or (A3) will bias classical IV approaches through the statistical
dependence between Z
and g_y(Z, \epsilon_x)
, summarized by the
covariance parameter \gamma := \mathrm{Cov} (g_y(Z, \epsilon_x), Z)
.
budgetIV
constrains \gamma
through a series of positive
thresholds 0 \leq \tau_1 < \tau_2 < \ldots < \tau_K
and corresponding
integer budgets 0 < b_1 < b_2 < \ldots < b_K \leq d_Z
. It is assumed
for each i \in \{ 1, \ldots, K\}
that no more than b_i
components
of \gamma
are greater in magnitude than \tau_i
. For instance,
taking d_Z = 100
, K = 1
, b_1 = 5
and \tau_1 = 0
means
assuming 5
of the 100
candidates are valid instrumental
variables (in the sense that their ratio estimates \theta_j :=
\mathrm{Cov}(Y, Z_j)/\mathrm{Cov}(\Phi(X), Z_j)
are unbiased).
With delta_beta_y = NULL
, budgetIV
returns the identified set
of causal effects that agree with both the budget constraints described above
and the values of \mathrm{Cov}(Y, Z)
and \mathrm{Cov}(Y, Z)
,
assumed to be exactly precise. Unlike classical partial identification
methods (see Manski (1990) for a canonical example), the non-convex
mixed-integer budget constraints yield a possibly disconnected solution set.
Each connected subset has a different interpretation as to which of the
candidate instruments Z
are valid up to each threshold.
delta_beta_y
represents box-constraints to
quantify uncertainty in beta_y
. In the examples, delta_beta_y
is calculated through a Bonferroni correction and gives an (asymptotically)
valid confidence set over beta_y
. Under the so-called "no measurement
error" assumption (see Bowden et al. (2016)), which is commonly applied in
Mendelian randomization, it is assumed that the estimate of beta_y
is
the dominant source of finite-sample uncertainty, with uncertainty in
beta_x
considered negligible. With an (asymptotically) valid confidence
set for delta_beta_y
, and under the "no measurement error" assumption,
budgetIV
returns an (asymptotically) valid confidence set for
\theta
when using just a single exposure.
A data.table
with each row corresponding to a set of bounds on the ATE
at a given point in ATE_search_domain
. Columns include: a non-unique
identifier curve_index
with a one-to-one mapping with U
;
lower_ATE_bound
and upper_ATE_bound
for the corresponding
bounds on the ATE; a list U
for the corresponding budget assignment;
and a column for each unique variable in ATE_search_domain
to indicate
the treatment value at which the bounds are being calculated.
Jordan Penn, Lee Gunderson, Gecia Bravo-Hermsdorff, Ricardo Silva, and David Watson. (2024). BudgetIV: Optimal Partial Identification of Causal Effects with Mostly Invalid Instruments. AISTATS 2025.
Jack Bowden, Fabiola Del Greco M, Cosetta Minelli, George Davey Smith, Nuala A Sheehan, and John R Thompson. (2016). Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I^2 statistic. Int. J. Epidemiol. 46.6, pp. 1985–1998.
Charles F Manski. (1990). Nonparametric bounds on treatment effects. Am. Econ. Rev. 80.2, pp. 219–323.
Henri Theil. (1953). Repeated least-squares applied to complete equation systems. Centraal Planbureau Memorandum.
Eleanor Sanderson, George Davey Smith, Frank Windmeijer and Jack Bowden. (2019). An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int. J. Epidemiol. 48.3, pp. 713–727.
data(simulated_data_budgetIV)
beta_y <- simulated_data_budgetIV$beta_y
beta_phi_1 <- simulated_data_budgetIV$beta_phi_1
beta_phi_2 <- simulated_data_budgetIV$beta_phi_2
beta_phi <- matrix(c(beta_phi_1, beta_phi_2), nrow = 2, byrow = TRUE)
delta_beta_y <- simulated_data_budgetIV$delta_beta_y
tau_vec = c(0)
b_vec = c(3)
x_vals <- seq(from = 0, to = 1, length.out = 500)
ATE_search_domain <- expand.grid("x" = x_vals)
phi_basis <- expression(x, x^2)
X_baseline <- list("x" = c(0))
solution_set <- budgetIV(beta_y = beta_y,
beta_phi = beta_phi,
phi_basis = phi_basis,
tau_vec = tau_vec,
b_vec = b_vec,
ATE_search_domain = ATE_search_domain,
X_baseline = X_baseline,
delta_beta_y = delta_beta_y)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.