View source: R/budgetIV_scalar.R
budgetIV_scalar | R Documentation |
Partial identification and coverage of a causal effect parameter using summary statistics and budget constraint assumptions. See Penn et al. (2025) for technical definitions.
budgetIV_scalar(
beta_y,
beta_phi,
tau_vec = NULL,
b_vec = NULL,
delta_beta_y = NULL,
bounds_only = TRUE
)
beta_y |
A |
beta_phi |
A |
tau_vec |
A |
b_vec |
A |
delta_beta_y |
A |
bounds_only |
A boolean If TRUE (default), the output consists only of disjoint bounds. Otherwise, if FALSE, the output consists of bounds for possibly touching intervals (but never overlapping), as well as the budget assignment corresponding to each bound. |
Instrumental variables are defined by three structural assumptions: (A1) they are associated with the treatment;
(A2) they are unconfounded with the outcome; and (A3) they exclusively effect the outcome through the treatment.
Assumption (A1) has a simple statistical test, whereas for many data generating processes (A2) and (A3) are
unprovably false.
The budgetIV
and budgetIV_scalar
algorithms allow for valid causal inference when some proportion,
possibly a small minority, of candidate instruments satisfy both (A2) and (A3).
budgetIV
& budgetIV_scalar
assume a homogeneous treatment effect, which implies the separable structural
equation Y = \theta \Phi(X) + g_y(Z, \epsilon_x)
.
The difference between the algorithms is that budgetIV_scalar
assumes \Phi(X)
and \theta
take
scalar values, which is exploited for super-exponential computational speedup and allows for causal inference
with thousands of candidate instruments Z
.
Both methods assume ground truth knowledge of the functional form of \Phi (X)
, e.g., a linear,
logistic, Cox hazard, principal component based or other model.
The parameter \theta
captures the unknown treatment effect.
Violation of (A2) and/or (A3) will bias classical IV approaches through the statistical dependence
between Z
and g_y(Z, \epsilon_x)
, summarized by the covariance parameter
\gamma := \mathrm{Cov} (g_y(Z, \epsilon_x), Z)
.
budgetIV
& budgetIV_scalar
constrain \gamma
through a series of positive thresholds
0 \leq \tau_1 < \tau_2 < \ldots < \tau_K
and corresponding integer budgets 0 < b_1 < b_2 < \ldots < b_K \leq d_Z
.
It is assumed for each i \in \{ 1, \ldots, K\}
that no more than b_i
components of \gamma
are greater in
magnitude than \tau_i
.
For instance, taking d_Z = 100
, K = 1
, b_1 = 5
and \tau_1 = 0
means
assuming 5
of the 100
candidates are valid instrumental variables (in the sense that their ratio
estimates \theta_j := \mathrm{Cov}(Y, Z_j)/\mathrm{Cov}(\Phi(X), Z_j)
are unbiased).
With delta_beta_y = NA
, budgetIV
& budgetIV_scalar
return the identified set
of causal effects that agree with both the budget constraints described above and the values of
\mathrm{Cov}(Y, Z)
and \mathrm{Cov}(Y, Z)
, assumed to be exactly precise.
Unlike classical partial identification methods (see Manski (1990) ofr a canonical example), the non-convex mixed-integer
budget constraints yield a possibly disconnected identified set.
Each connected subset has a different interpretation as to which of the candidate instruments Z
are valid up to each threshold.
budgetIV_scalar
returns these interpretations alongside the corresponding bounds on \theta
.
When delta_beta_y
is not null, it is used as box-constraints to quantify uncertainty in beta_y
.
In the examples, delta_beta_y
is calculated through a Bonferroni correction and gives an (asymptotically)
valid confidence set over beta_y
.
Under the so-called "no measurement error" (NOME) assumption (see Bowden et al. (2016)) which is commonly applied in Mendelian randomisation, it is
assumed that the estimate of beta_y
is the dominant source of finite-sample uncertainty, with uncertainty in beta_x
entirely negligible.
With an (asymptotically) valid confidence set over delta_beta_y
and under the "no measurement error" assumption, budgetIV_scalar
returns an (asymptotically) valid confidence set for \theta
.
A data.table with each row corresponding to bounds on the scalar causal effect parameter \theta
corresponding to a particular budget assignment U
(see Penn et al. (2025)).
The return table has the following rows: a logical is_point
determining whether the upper and lower bounds are equivalent; numerical lower_bound
and upper_bound
giving the lower and upper bounds; and a list budget_assignment
giving the value of U
for each candidate instrument.
budget_assignment
will only be returned if bounds_only == FALSE
as input by the user.
A list of two entries: intervals
, which is a two-column matrix with rows corresponding to disjoint bounds containing plausible values of \theta
;
and points
, which is a one-column matrix consisting of lone plausible values of \theta
—relevant when using \tau_1 = 0
.
Jordan Penn, Lee Gunderson, Gecia Bravo-Hermsdorff, Ricardo Silva, and David Watson. (2024). BudgetIV: Optimal Partial Identification of Causal Effects with Mostly Invalid Instruments. arXiv preprint, 2411.06913.
Jack Bowden, Fabiola Del Greco M, Cosetta Minelli, George Davey Smith, Nuala A Sheehan, and John R Thompson. (2016). Assessing the suitability of summary data for two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I^2 statistic. Int. J. Epidemiol. 46.6, pp. 1985–1998.
Charles F Manski. (1990). Nonparametric bounds on treatment effects. Am. Econ. Rev. 80.2, pp. 219–323.
data(Do_et_al_summary_statistics)
candidatesHDL = Do_et_al_summary_statistics[Do_et_al_summary_statistics$pHDL <= 1e-8, ]
candidate_labels <- candidatesHDL$rsID
d_Z <- length(candidate_labels)
beta_x <- candidatesHDL$betaHDL
beta_y <- candidatesHDL$betaCAD
SE_beta_y <- abs(beta_y) / qnorm(1-candidatesHDL$pCAD/2)
alpha = 0.05
delta_beta_y <- qnorm(1 - alpha/(2*d_Z))*SE_beta_y
feasible_region <- budgetIV_scalar(
beta_y = candidatesHDL$betaCAD,
beta_phi = beta_x,
tau_vec = c(0),
b_vec = c(30),
delta_beta_y = delta_beta_y,
bounds_only = FALSE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.