R/budgetIV_scalar.R
In budgetIVr: Partial Identification of Causal Effects with Mostly Invalid Instruments

Documented in budgetIV_scalar

#' Efficient partial identification of a scalar causal effect parameter with invalid instruments
#' 
#' Partial identification and coverage of a causal effect parameter using summary statistics and budget constraint assumptions.
#' See Penn et al. (2025) for technical definitions.
#' 
#' @param beta_y A \eqn{d_{Z}}-dimensional vector representing the (estimated) 
#' cross covariance \eqn{\mathrm{Cov}(Y, Z)}.
#' @param beta_phi A \eqn{d_{Z}}-dimensional vector representing the (estimated) 
#' cross covariance \eqn{\mathrm{Cov}(\Phi (X), Z)}.
#' @param tau_vec A \eqn{K}-dimensional vector of increasing, positive 
#' thresholds representing degrees of IV invalidity. 
#' The default value \code{NULL} can be used for a single threshold at \eqn{0}.
#' @param b_vec A \eqn{K}-dimensional vector of increasing positive integers 
#' representing the maximum number of IVs that can surpass each threshold. 
#' The default value \code{NULL} can be used for a single threshold at \eqn{0}, with at least \eqn{50\%} of IVs assumed to be valid.
#' @param delta_beta_y A \eqn{d_{Z}}-dimensional vector of positive half-widths for box-shaped 
#' confidence bounds on \code{beta_y}. 
#' The default value \code{NULL} can be used to not include finite sample uncertainty.
#' @param bounds_only A boolean \code{TRUE} or \code{FALSE}. \code{TRUE} will store overlapping intervals in the confidence set as a single interval, 
#' while \code{FALSE} will store different intervals for different values of \code{budget_assignment} (see return value of Penn et al. (2025) for further details).
#' The default is \code{TRUE}.
#' 
#' If TRUE (default), the output consists only of disjoint bounds. Otherwise, if FALSE, the output consists of bounds for 
#' possibly touching intervals (but never overlapping), as well as the budget assignment corresponding to each bound. 
#' 
#' 
#' @details 
#' Instrumental variables are defined by three structural assumptions: (A1) they are associated with the treatment; 
#' (A2) they are unconfounded with the outcome; and (A3) they exclusively effect the outcome through the treatment. 
#' Assumption (A1) has a simple statistical test, whereas for many data generating processes (A2) and (A3) are 
#' unprovably false. 
#' The \code{budgetIV} and \code{budgetIV_scalar} algorithms allow for valid causal inference when some proportion, 
#' possibly a small minority, of candidate instruments satisfy both (A2) and (A3).
#' 
#' \code{budgetIV} & \code{budgetIV_scalar} assume a homogeneous treatment effect, which implies the separable structural 
#' equation \eqn{Y = \theta \Phi(X) + g_y(Z, \epsilon_x)}. 
#' The difference between the algorithms is that \code{budgetIV_scalar} assumes \eqn{\Phi(X)} and \eqn{\theta} take
#' scalar values, which is exploited for super-exponential computational speedup and allows for causal inference
#' with thousands of candidate instruments \eqn{Z}.
#' Both methods assume ground truth knowledge of the functional form of \eqn{\Phi (X)}, e.g., a linear, 
#' logistic, Cox hazard, principal component based or other model. 
#' The parameter \eqn{\theta} captures the unknown treatment effect.
#' Violation of (A2) and/or (A3) will bias classical IV approaches through the statistical dependence
#' between \eqn{Z} and \eqn{g_y(Z, \epsilon_x)}, summarized by the covariance parameter 
#' \eqn{\gamma := \mathrm{Cov} (g_y(Z, \epsilon_x), Z)}.
#' 
#' \code{budgetIV} & \code{budgetIV_scalar} constrain \eqn{\gamma} through a series of positive thresholds 
#' \eqn{0 \leq \tau_1 < \tau_2 < \ldots < \tau_K} and corresponding integer budgets \eqn{0 < b_1 < b_2 < \ldots < b_K \leq d_Z}. 
#' It is assumed for each \eqn{i \in \{ 1, \ldots, K\}} that no more than \eqn{b_i} components of \eqn{\gamma} are greater in 
#' magnitude than \eqn{\tau_i}.
#' For instance, taking \eqn{d_Z = 100}, \eqn{K = 1}, \eqn{b_1 = 5} and \eqn{\tau_1 = 0} means 
#' assuming \eqn{5} of the \eqn{100} candidates are valid instrumental variables (in the sense that their ratio 
#' estimates \eqn{\theta_j := \mathrm{Cov}(Y, Z_j)/\mathrm{Cov}(\Phi(X), Z_j)} are unbiased).
#' 
#' With \code{delta_beta_y = NA}, \code{budgetIV} & \code{budgetIV_scalar} return the identified set
#' of causal effects that agree with both the budget constraints described above and the values of
#' \eqn{\mathrm{Cov}(Y, Z)} and \eqn{\mathrm{Cov}(Y, Z)}, assumed to be exactly precise. 
#' Unlike classical partial identification methods (see Manski (1990) ofr a canonical example), the non-convex mixed-integer
#' budget constraints yield a possibly disconnected identified set. 
#' Each connected subset has a different interpretation as to which of the candidate instruments \eqn{Z} 
#' are valid up to each threshold.
#' \code{budgetIV_scalar} returns these interpretations alongside the corresponding bounds on \eqn{\theta}. 
#' 
#' When \code{delta_beta_y} is not null, it is used as box-constraints to quantify uncertainty in \code{beta_y}. 
#' In the examples, \code{delta_beta_y} is calculated through a Bonferroni correction and gives an (asymptotically) 
#' valid confidence set over \code{beta_y}. 
#' Under the so-called "no measurement error" (NOME) assumption (see Bowden et al. (2016)) which is commonly applied in Mendelian randomisation, it is
#' assumed that the estimate of \code{beta_y} is the dominant source of finite-sample uncertainty, with uncertainty in \code{beta_x}
#' entirely negligible. 
#' With an (asymptotically) valid confidence set over \code{delta_beta_y} and under the "no measurement error" assumption, \code{budgetIV_scalar} 
#' returns an (asymptotically) valid confidence set for \eqn{\theta}.  
#' 
#' @return A data.table with each row corresponding to bounds on the scalar causal effect parameter \eqn{\theta} corresponding to a particular budget assignment \eqn{U} (see Penn et al. (2025)). 
#' The return table has the following rows: a logical \code{is_point} determining whether the upper and lower bounds are equivalent; numerical \code{lower_bound}
#' and \code{upper_bound} giving the lower and upper bounds; and a list \code{budget_assignment} giving the value of \eqn{U} for each candidate instrument. 
#' \code{budget_assignment} will only be returned if \code{bounds_only == FALSE} as input by the user.
#' 
#' A list of two entries: \code{intervals}, which is a two-column matrix with rows corresponding to disjoint bounds containing plausible values of \eqn{\theta}; 
#' and \code{points}, which is a one-column matrix consisting of lone plausible values of \eqn{\theta}---relevant when using \eqn{\tau_1 = 0}.   
#' 
#' @references  
#' Jordan Penn, Lee Gunderson, Gecia Bravo-Hermsdorff,
#' Ricardo Silva, and David Watson. (2024). BudgetIV: Optimal Partial Identification of Causal Effects with Mostly Invalid Instruments. \emph{arXiv}
#' preprint, 2411.06913.
#' 
#' Jack Bowden, Fabiola Del Greco M, Cosetta Minelli, George Davey Smith, Nuala A Sheehan, and John R Thompson. (2016). Assessing the suitability of summary data for 
#' two-sample Mendelian randomization analyses using MR-Egger regression: the role of the I^2 statistic. \emph{Int. J. Epidemiol.} 46.6, pp. 1985--1998.
#' 
#' Charles F Manski. (1990). Nonparametric bounds on treatment effects. \emph{Am. Econ. Rev.} 80.2, pp. 219--323.
#' 
#' @examples  
#' data(Do_et_al_summary_statistics)
#' 
#' candidatesHDL = Do_et_al_summary_statistics[Do_et_al_summary_statistics$pHDL <= 1e-8, ]
#' 
#' candidate_labels <- candidatesHDL$rsID
#' d_Z <- length(candidate_labels)
#' 
#' beta_x <- candidatesHDL$betaHDL
#' 
#' beta_y <- candidatesHDL$betaCAD
#' 
#' SE_beta_y <- abs(beta_y) / qnorm(1-candidatesHDL$pCAD/2)
#' 
#' alpha = 0.05
#' delta_beta_y <- qnorm(1 - alpha/(2*d_Z))*SE_beta_y
#' 
#' feasible_region <- budgetIV_scalar(
#'                                    beta_y = candidatesHDL$betaCAD,
#'                                    beta_phi = beta_x,
#'                                    tau_vec = c(0),
#'                                    b_vec = c(30),
#'                                    delta_beta_y = delta_beta_y,
#'                                    bounds_only = FALSE
#'                                    )
#' 
#' @export 
#' @import data.table
#' 
#' 

budgetIV_scalar <- function(
    beta_y,
    beta_phi,
    tau_vec=NULL,
    b_vec=NULL,
    delta_beta_y=NULL,
    bounds_only=TRUE
) {
  
  if(is.matrix(beta_y) && is.numeric(beta_y)){ 
    if(nrow(beta_y) == 1){
      
      beta_y <- as.vector(beta_y)
    }
    else{
      stop("Argument 'beta_y' must be a vector or a matrix with one row.")
    }
  }
  else if(!is.numeric(beta_y)){
    stop("Argument 'beta_y' must have numeric entries and be input as a matrix or vector.")
  }
  
  if(is.matrix(beta_phi)){ 
    if(nrow(beta_phi) == 1 && is.numeric(beta_phi)){
      beta_y <- as.vector(beta_phi)
    }
    else{
      stop("Argument 'beta_phi' must be a vector or a matrix with one row. 
           Please use budgetIV for non-scalar 'phi_basis'.")
    }
  }
  else if(!is.vector(beta_phi)){
    stop("Argument 'beta_phi' must have numeric entries and be input as a matrix or vector.")
  }
  
  if (is.null(delta_beta_y)){
    warning("Argument 'delta_beta_y' not specified. 
            No confidence bounds for agument 'beta_y' given: treating 'beta_y' as an oracle summary statistic.")
    delta_beta_y <- numeric(d_Z)
  }
  
  if(is.matrix(delta_beta_y) && is.numeric(delta_beta_y)){ 
    if(nrow(delta_beta_y) == 1){
      delta_beta_y <- as.vector(delta_beta_y)
    }
    else{
      stop("Argument 'delta_beta_y' must be a vector or a matrix with one row.")
    }
  }
  else if(!is.numeric(delta_beta_y)){
    stop("Argument 'delta_beta_y' must have numeric entries and be input as a matrix or vector.")
  }
  
  d_Z <- ncol(beta_y)
  
  # Error messages
  if(!is.vector(beta_y)){
    stop("Argument 'beta_y' must be a vector or single-row matrix.")
  }
  else if(!is.numeric(beta_y)){
    stop("Argument 'beta_y' must have numeric entries.")
  }
  else if(!is.numeric(beta_phi)){
    stop("Argument 'beta_phi' must have numeric entries.")
  }
  else if (length(beta_y) != length(beta_phi)) {
    stop("Arguments 'beta_y' and 'beta_phi' must have the same length/number of columns.")
  }
  
  # Dealing with NULL tau_vec and/or b_vec 
  if(is.null(tau_vec)){
    if(is.null(b_vec)){
      tau_vec <- c(0)
      b_vec <- c(d_Z %/% 2)
    }
    else if(length(b_vec) == 1){
      tau_vec <- c(0)
    }
    else{
      stop("Argument 'tau_vec' is NULL while argument 'b_vec' has length greater than 1. When specifying multiple budgets, please specify as many thresholds.")
    }
  }
  else if(is.null(b_vec)){
    if(length(tau_vec) == 1){
      b_vec <- c(d_Z %/% 2)
    }
    else{
      stop("Argument 'b_vec' is NULL while argument 'tau_vec' has length greater than 1. When specifying multiple thresholds, please specify as many budgets.")
    }
  }
  
  if(!is.vector(tau_vec)){
    stop("Argument 'tau_vec' must be a vector. Use tau_vec = c(threshold_value) for a single budget constraint (e.g., tau_vec = c(0) for an L_0-norm constraint).")
  }
  else if(!is.numeric(tau_vec)){
    stop("Argument 'tau_vec' must have numeric entries.")
  }
  else if(!all(tau_vec >= 0)){
    stop("Argument 'tau_vec' must have positive entries.")
  }
  else if (is.unsorted(tau_vec)) {
    stop("Argument 'tau_vec' must have entries in increasing order.")
  }
  else if (any(duplicated(tau_vec))){ 
    stop("Argument 'tau_vec' must be strictly increasing, i.e., with no repeated entries.")
  }
  else if(!is.vector(b_vec)){
    stop("Argument 'b_vec' must be a vector. Use b_vec = c(budget_value) for a single budget constraint.")
  }
  else if(!is.numeric(b_vec)){
    stop("Argument 'b_vec' must have numeric entries.")
  }
  else if(!all(b_vec == as.integer(b_vec)) & is.numeric(b_vec)){
    stop("Argument 'b_vec' must have integer entries.")
  }
  else if(!all(b_vec > 0)){
    stop("Argument 'b_vec' must have entries strictly greater than zero.")
  }
  if (is.unsorted(b_vec)) {
    stop("Argument 'b_vec' must have entries in increasing order.")
  }
  else if (any(duplicated(b_vec))){ 
    stop("Argument 'b_vec' must be strictly increasing, i.e., with no repeated entries.")
  }
  else if(any(delta_beta_y < 0)){
    stop("Argument 'delta_beta_y' must have positive entries.")
  }
  else if (length(beta_y) != length(beta_phi)) {
    stop("Arguments 'beta_y' and 'beta_phi' must be vectors of the same length for scalar Phi(X). Please call 'budgetIV' for treatment of vector Phi(X).")
  }
  else if (length(delta_beta_y) != length(beta_y)){
    stop("Argument 'delta_beta_y', if given, must be of the same length as beta_y.")
  }
  else if (!is.logical(bounds_only) || length(bounds_only) != 1 || is.na(bounds_only)){
    stop("Argument 'bounds_only' must be a single TRUE or FALSE value only.")
  }
  
  d_Z <- length(beta_y)
  
  tau_intervals_lower <- matrix(nrow = (length(tau_vec)), ncol = d_Z)
  
  tau_intervals_upper <- matrix(nrow = (length(tau_vec)), ncol = d_Z)
  
  for (k in 1:length(tau_vec)){
    for (j in 1:d_Z){
      
      tau_intervals_lower[k, j] <- beta_y[j] / beta_phi[j] - (tau_vec[k]+delta_beta_y[j]) / beta_phi[j]
      
      tau_intervals_upper[k, j] <- beta_y[j] / beta_phi[j] + (tau_vec[k]+delta_beta_y[j]) / beta_phi[j]
      
    }}
  
  possible_bounds <- sort(c(c(tau_intervals_lower), c(tau_intervals_upper)))
  
  in_feasible <- FALSE
  
  if (bounds_only == TRUE){
    
    causal_effect_bounds <- data.table(
      "is_point"=logical(),
      "lower_bound"=numeric(),
      "upper_bound"=numeric()
    )
    
    for (p in 1:(length(possible_bounds)-1)){
      
      curr_point <- possible_bounds[p]
      
      curr_point_interval <- (possible_bounds[p] + possible_bounds[p+1])/2
      
      curr_interval_feasible <- validPoint_scalar(beta_y, beta_phi, d_Z, curr_point_interval, b_vec, tau_vec, delta_beta_y)
      
      curr_point_feasible <- validPoint_scalar(beta_y, beta_phi, d_Z, curr_point, b_vec, tau_vec, delta_beta_y)
      
      if(curr_interval_feasible && !in_feasible){
        
        last_feasible_opening <- curr_point
        in_feasible <- TRUE
        
      }
      
      else if(!curr_interval_feasible && in_feasible){
        
        in_feasible <- FALSE
        
        new_interval <- data.table(
          "is_point"=FALSE,
          "lower_bound"=last_feasible_opening,
          "upper_bound"=curr_point
        )
        
        causal_effect_bounds <- rbind(causal_effect_bounds, new_interval)
        
      }
      
      else if(!in_feasible && curr_point_feasible){
        
        new_point <- data.table(
          "is_point"=TRUE,
          "lower_bound"=curr_point,
          "upper_bound"=curr_point
        )
        
        causal_effect_bounds <- rbind(causal_effect_bounds, new_point)
        
      }
      
    }
    
    curr_point <- possible_bounds[length(possible_bounds)]
    
    if(in_feasible){ 
      
      new_interval <- data.table(
        "is_point"=FALSE,
        "lower_bound"=last_feasible_opening,
        "upper_bound"=curr_point
      )
      
      causal_effect_bounds <- rbind(causal_effect_bounds, new_interval)
      
    }
    
    else if(validPoint_scalar(beta_y, beta_phi, d_Z, curr_point, b_vec, tau_vec, delta_beta_y)){
      
      new_point <- data.table(
        "is_point"=TRUE,
        "lower_bound"=curr_point,
        "upper_bound"=curr_point
      )
      
      causal_effect_bounds <- rbind(causal_effect_bounds, new_point)
      
    }
    
    return(causal_effect_bounds)
    
  }
  
  else if (bounds_only==FALSE){
    
    causal_effect_bounds <- data.table(
      "is_point"=logical(),
      "lower_bound"=numeric(),
      "upper_bound"=numeric(),
      "budget_assignment" = list()
    )
    
    for (p in 1:(length(possible_bounds)-1)){
      
      curr_point <- possible_bounds[p]
      
      curr_point_interval <- (curr_point + possible_bounds[p+1])/2
      
      curr_interval_feasible <- validPoint_scalar(beta_y, beta_phi, d_Z, curr_point_interval, b_vec, tau_vec, delta_beta_y)
      
      curr_point_feasible <- validPoint_scalar(beta_y, beta_phi, d_Z, curr_point, b_vec, tau_vec, delta_beta_y)
      
      if(curr_interval_feasible){
        
        curr_interval_budgets <- eval_budgets(beta_y, beta_phi, d_Z, curr_point_interval, tau_vec, delta_beta_y)
        
        last_interval_feasible <- TRUE
        
        new_interval <- data.table(
          "is_point" = FALSE,
          "lower_bound" = curr_point,
          "upper_bound" = possible_bounds[p+1],
          "budget_assignment" = list(curr_interval_budgets)
        )
        
        causal_effect_bounds <- rbind(causal_effect_bounds, new_interval)
        
      }
      
      else if(curr_point_feasible && !last_interval_feasible){
        
        curr_point_budgets <- eval_budgets(beta_y, beta_phi, d_Z, curr_point, tau_vec, delta_beta_y)
        
        new_point <- data.table(
          "is_point" = TRUE,
          "lower_bound" = curr_point,
          "upper_bound" = possible_bounds[p+1],
          "budget_assignment" = list(curr_point_budgets)
        )
        
        
        causal_effect_bounds <- rbind(causal_effect_bounds, new_point)
        
      }
      
      else { 
        
        last_interval_feasible <- FALSE 
      }
      
    }
    
    curr_point <- possible_bounds[length(possible_bounds)]
    
    curr_point_feasible <- validPoint_scalar(beta_y, beta_phi, d_Z, curr_point, b_vec, tau_vec, delta_beta_y)
    
    if(curr_point_feasible && !last_interval_feasible){
      
      curr_point_budgets <- eval_budgets(beta_y, beta_phi, d_Z, curr_point, tau_vec, delta_beta_y)
      
      new_point <- data.table(
        "is_point" = TRUE,
        "lower_bound" = curr_point,
        "upper_bound" = curr_point,
        "budget_assignment" = list(curr_point_budgets)
      )
      
      causal_effect_bounds <- rbind(causal_effect_bounds, new_point)
      
    }
    
    return(causal_effect_bounds)
    
    
  }
  
}

validPoint_scalar <- function(beta_y, beta_phi, d_Z, theta, b_vec, tau_vec, delta_beta_y){
  
  b_to_fill <- b_vec
  
  for(i in 1:d_Z){
    
    beta_theta_i <- beta_phi[i] * theta
    
    tol <- .Machine$double.eps * max(beta_y[i], beta_theta_i)
    
    gamma_i <- beta_y[i] - beta_phi[i] * theta
    
    for(tau_index in 1:length(tau_vec)){
      
      if(abs(gamma_i) <= tau_vec[tau_index]+delta_beta_y[i] + tol){ b_to_fill[tau_index] <- b_to_fill[tau_index] - 1 }
      
    }
  }
  
  return(all(b_to_fill <= 0))
  
}



# Return the budget assignment at a single point, corresponding to a single possible 'theta'. 
# For delta_beta_y not NA, the output is the smallest (in terms of component-wise partial order)

eval_budgets <- function(beta_y, beta_phi, d_Z, theta, tau_vec, delta_beta_y){
  
  gammas <- beta_y - theta * beta_phi
  
  budget_assignments <- rep(0, d_Z)
  
  for (i in 1:d_Z){
    
    tol <- .Machine$double.eps * max(beta_y[i], beta_phi[i] * theta)
    
    tau_index = 1
    
    for(tau_index in 1:length(tau_vec) ){
        
      if (abs(gammas[i]) <= tau_vec[tau_index] + delta_beta_y[i] + tol){budget_assignments[i] <- tau_index }
        
    }
    
    if(budget_assignments[i] == 0){ budget_assignments[i] <- length(tau_vec)+1 }
      
  }
  
  return(budget_assignments)
    
}
Any scripts or data that you put into this service are public.
budgetIVr documentation built on April 16, 2025, 5:11 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
budgetIVr
Partial Identification of Causal Effects with Mostly Invalid Instruments

R/budgetIV_scalar.R
In budgetIVr: Partial Identification of Causal Effects with Mostly Invalid Instruments

Defines functions eval_budgets validPoint_scalar budgetIV_scalar

Documented in budgetIV_scalar

Try the budgetIVr package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

budgetIVr Partial Identification of Causal Effects with Mostly Invalid Instruments

R/budgetIV_scalar.R In budgetIVr: Partial Identification of Causal Effects with Mostly Invalid Instruments

Defines functions eval_budgets validPoint_scalar budgetIV_scalar

Documented in budgetIV_scalar

Try the budgetIVr package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

budgetIVr
Partial Identification of Causal Effects with Mostly Invalid Instruments

R/budgetIV_scalar.R
In budgetIVr: Partial Identification of Causal Effects with Mostly Invalid Instruments