In hungf8342/MIWilson: Implementing the MI-Wilson Confidence Interval

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

In "Wilson Confidence Intervals for Binomial Proportions With Multiple Imputation for Missing Data" (A. Lott & J. Reiter, 2018), the authors run simulation studies comparing coverage of MI-Wilson and MI-Wald confidence intervals, among a few other slight variations of the two. This is good motivation for using the phat versions of the mi_wilson and mi_wald functions. While we don't implement the full simulations here, we lay out a foundation and demonstrate one use of the mi_wald_phat and mi_wilson_phat functions.

We first load the MI-Wilson library as follows:

library(MIWilson)

We then create a simple master dataset with binary values and induce MCAR missingness. Users of our package who want to use the phat versions of the main functions will provide their own imputed datasets which don't neccessarily have to produced via Bayesian methods or the following code.

In this vignette, master dataset creation is carried out by the below create_missing_data function. With the incomplete master dataset, we create multiple imputations using Bayesian principles (see paper for details), using the create_imps function.

#creating missing data
create_missing_data <- function(n, p, m, MIA_perc) {

  complete = incomplete = rbinom(n, 1, p)

  #setting up number of missing values, dataset with missing values
  blanks = floor(MIA_perc * n)
  idcs = 1:length(complete)
  incomplete[sample(idcs, blanks)] = NA

  return(incomplete)

}


#creating multiple imputations
create_imps <- function(n, m, incomplete) {

  count_one = table(incomplete)[2]
  count_zero = table(incomplete)[1]

  imputations = matrix(nrow = n, ncol = m)
  for (i in 1:m) {
    p_star = rbeta(1, count_one + 1, count_zero + 1)
    incomp_idx = which(is.na(incomplete))

    curr_imp = incomplete
    curr_imp[incomp_idx] = rbinom(length(incomp_idx), 1, p_star)

    imputations[,i] = curr_imp
  }

  return(imputations)

}

To demonstrate, we create a master dataset with a true binomial proportion of $p=0.5$ and induce MCAR missingness for 30\% of the dataset. We then produce $m=10$ imputations and use them to create MI-Wilson and MI-Wald confidence intervals for $p$.

n = 100
p = 0.7
m = 10
MIA_perc = 0.3

incomplete = create_missing_data(n, p, m, MIA_perc)
imputations = create_imps(n, m, incomplete)

phats = colSums(imputations)/nrow(imputations)
mi_wald_phat(phats = phats, n = nrow(imputations))
mi_wilson_phat(phats = phats, n =nrow(imputations))