emery
In emery: Accuracy Statistic Estimation for Imperfect Gold Standards

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(emery)
set.seed(65123)

Emery is a package for estimating accuracy statistics for multiple measurement methods in the absence of a gold standard. It supports sets of methods which are binary, ordinal, or continuous.

The generate_multimethod_data() function can be used to simulate the results from paired measurements of a set of objects.

ex_bin_data <- 
  generate_multimethod_data(
    type = "binary",
    n_method = 3,
    n_obs = 200,
    se = c(0.85, 0.90, 0.95),
    sp = c(0.95, 0.90, 0.85),
    method_names = c("alpha", "beta", "gamma")
  )
ex_bin_data$generated_data[98:103, ]

The resulting list contains the simulated data as well as the parameters used to generate it. If method, observation, or level (ordinal only) names are not provided, default names will be applied

Estimating the accuracy statistics of each method is as simple as calling the estimate_ML() function on the data set. The function expects the data to be a matrix of results with each row representing an observation and each column representing a method. Starting values for the EM algorithm can be provided through the init argument, but these are not required.

ex_bin <- 
  estimate_ML(
    type = "binary",
    data = ex_bin_data$generated_data,
    init = list(prev_1 = 0.8, se_1 = c(0.7, 0.8, 0.75), sp_1 = c(0.85, 0.95, 0.75))
  )
ex_bin

The result of this function is an S4 object of the class MultiMethodMLEstimate. Basic plots illustrating the estimation process can be created by calling the standard plot() function on the object.

plot(ex_bin)

If the true population parameters are known, as is the case with simulated data, these can be provided to the plot function to enhance the information provided.

plot(ex_bin, params = ex_bin_data$params)

The process for working with ordinal or continuous data is similar to above, though the inputs tend to be more complex.

To simulate ordinal data, we must supply the probability mass functions (pmf) associated with the method's levels for the "positive" and "negative" observations. It is assumed that "positive" observations correspond to higher levels.

An example pmf for detecting "positive" observations for 3 methods with 5 levels may look something like this.

pmf_pos_ex <- 
  matrix(
    c(
      c(0.05, 0.10, 0.15, 0.30, 0.40),
      c(0.00, 0.05, 0.20, 0.25, 0.50),
      c(0.10, 0.15, 0.20, 0.25, 0.30)
    ),
    nrow = 3, 
    byrow = TRUE
  )

pmf_pos_ex

We'll assume the pmf for negative observations is just the reverse of this for simplicity here.

pmf_neg_ex <- pmf_pos_ex[, 5:1]

ex_ord_data <- 
  generate_multimethod_data(
    type = "ordinal",
    n_method = 3,
    n_obs = 200,
    pmf_pos = pmf_pos_ex,
    pmf_neg = pmf_neg_ex,
    method_names = c("alice", "bob", "carrie"),
    level_names = c("strongly dislike", "dislike", "neutral", "like", "strongly like")
  )
ex_ord_data$generated_data[98:103, ]

ex_ord <- 
  estimate_ML(
    type = "ordinal",
    data = ex_ord_data$generated_data,
    level_names = ex_ord_data$params$level_names
  )
ex_ord

plot(ex_ord, params = ex_ord_data$params)

Unlike binary and ordinal methods which require 3 or more methods to create estimates, continuous method estimates can be produced with data from just 2.

ex_con_data <- 
  generate_multimethod_data(
    type = "continuous",
    n_method = 3,
    n_obs = 200,
    method_names = c("phi", "kappa", "sigma")
  )
ex_con_data$generated_data[98:103, ]

Estimating the accuracy parameters is the same as above.

ex_con <- 
  estimate_ML(
    type = "continuous",
    data = ex_con_data$generated_data
  )
ex_con

plot(ex_con, params = ex_con_data$params)

Confidence intervals for all accuracy statistics can be estimated by bootstrap. The boot_ML() function is a handy tool for generating bootstrapped estimates.

ex_boot_bin <- boot_ML(
  type = "binary",
  data = ex_bin_data$generated_data,
  n_boot = 20
)

# print the estimates of sensitivity from the complete data set
ex_boot_bin$v_0@results$se_est

# print the first 3 bootstrap estimates of sensitivity
ex_boot_bin$v_star[[1]]$se_est
ex_boot_bin$v_star[[2]]$se_est
ex_boot_bin$v_star[[3]]$se_est