binGroup2: binGroup2: Identification and Estimation using Group Testing

binGroup2R Documentation

binGroup2: Identification and Estimation using Group Testing

Description

Methods for the group testing identification and estimation problems.

Details

Methods for identification of positive items in group testing designs: Operating characteristics (e.g., expected number of tests) are calculated for commonly used hierarchical and array-based algorithms. Optimal testing configurations for an algorithm can be found as well. Please see Hitt et al. (2019) for specific details.

Methods for estimation and inference for proportions in group testing designs: For estimating one proportion or the difference of proportions, confidence interval methods are included that account for different pool sizes. Functions for hypothesis testing of proportions, calculation of power, and calculation of the expected width of confidence intervals are also included. Furthermore, regression methods and simulation of group testing data are implemented for simple pooling (Dorfman testing with or without retests), halving, and array testing designs.

The binGroup2 package is based upon the binGroup package that was originally designed for the group testing estimation problem. Over time, additional functions for estimation and for the group testing identification problem were included. Due to the diverse styles resulting from these additions, we have created binGroup2 as a way to unify functions in a coherent structure and incorporate additional functions for identification. The binGroup2 package provides all the main functionality from the binGroup package, and can be used in place of the binGroup package. The name “binGroup” originates from the assumption in basic estimation for group testing that the number of positive groups has a binomial distribution. While more advanced estimation methods no longer make this assumption, we continue with the binGroup name for consistency.

Bilder (2019a,b) provide introductions to group testing. These papers and additional details about group testing are available at http://chrisbilder.com/grouptesting/.

This research was supported by the National Institutes of Health under grant R01 AI121351.

Identification

The binGroup2 package focuses on the group testing identification problem using hierarchical and array-based group testing algorithms.

The OTC1 function implements a number of group testing algorithms, described in Hitt et al. (2019), which calculate the operating characteristics and find the optimal testing configuration over a range of possible initial group sizes and/or testing configurations (sets of subsequent group sizes). The OTC2 function does the same with a multiplex assay that tests for two diseases.

The operatingCharacteristics1 (opChar1) and operatingCharacteristics2 (opChar2) functions calculate operating characteristics for a specified testing configuration with assays that test for one and two diseases, respectively.

These functions allow the sensitivity and specificity to differ across stages of testing. This means that the accuracy of the diagnostic test can differ for stages in a hierarchical testing algorithm or between row/column testing and individual testing in an array testing algorithm.

Estimation

The binGroup2 package also provides functions for estimation and inference for proportions in group testing designs.

The propCI function calculates the point estimate and confidence intervals for a single proportion from group testing data. The propDiffCI function does the same for the difference of proportions. A number of confidence interval methods are available for groups of equal or different sizes.

The gtWidth function calculates the expected width of confidence intervals in group testing. The gtTest function calculates p-values for hypothesis tests of single proportions. The gtPower function calculates power to reject a hypothesis.

The designPower function iterates either the number of groups or group size in a one-parameter group testing design until a pre-specified power level is achieved. The designEst function finds the optimal group size corresponding to the minimal mean-squared error of the point estimator.

The gtReg function implements regression methods and the gtSim function simulates group testing data for simple pooling, halving, and array testing designs.

Author(s)

Maintainer: Brianna Hitt brianna.hitt@afacademy.af.edu (ORCID)

Authors:

  • Christopher Bilder (ORCID)

  • Frank Schaarschmidt (ORCID)

  • Brad Biggerstaff (ORCID)

  • Christopher McMahan (ORCID)

  • Joshua Tebbs (ORCID)

Other contributors:

  • Boan Zhang [contributor]

  • Michael Black [contributor]

  • Peijie Hou [contributor]

  • Peng Chen [contributor]

  • Minh Nguyen [contributor]

References

\insertRef

Altman1994abinGroup2

\insertRef

Altman1994bbinGroup2

\insertRef

Biggerstaff2008binGroup2

\insertRef

Bilder2010abinGroup2

\insertRef

Bilder2019binGroup2

\insertRef

Bilder2019estbinGroup2

\insertRef

Bilder2019idbinGroup2

\insertRef

bilder2020testsbinGroup2

\insertRef

Black2012binGroup2

\insertRef

Black2015binGroup2

\insertRef

Graff1972binGroup2

\insertRef

Hepworth1996binGroup2

\insertRef

Hepworth2017binGroup2

\insertRef

Hitt2019binGroup2

\insertRef

Hou2019binGroup2

\insertRef

Malinovsky2016binGroup2

\insertRef

McMahan2012abinGroup2

\insertRef

McMahan2012bbinGroup2

\insertRef

Schaarschmidt2007binGroup2

\insertRef

Swallow1985binGroup2

\insertRef

Tebbs2004binGroup2

\insertRef

Vansteelandt2000binGroup2

\insertRef

Verstraeten1998binGroup2

\insertRef

Xie2001binGroup2

Examples


# 1) Identification using hierarchical and array-based
#   group testing algorithms with an assay that tests
#   for one disease.

# 1.1) Find the optimal testing configuration over a
#   range of initial group sizes, using informative
#   three-stage hierarchical testing, where
#   p denotes the overall prevalence of disease (mean
#    parameter of a beta distribution);
#   Se denotes the sensitivity of the diagnostic test;
#   Sp denotes the specificity of the diagnostic test;
#   group.sz denotes the range of initial pool sizes
#   for consideration;
#   obj.fn specifies the objective functions for which
#   to find results; and
#   alpha is the heterogeneity level.
set.seed(1002)
results1 <- OTC1(algorithm = "ID3", p = 0.025, Se = 0.95,
                 Sp = 0.95, group.sz = 3:20,
                 obj.fn = "ET", alpha = 2)
summary(results1)

# 1.2) Find the optimal testing configuration using
#   non-informative array testing without master pooling.
# The sensitivity and specificity differ for row/column
#   testing and individual testing.
results2 <- OTC1(algorithm = "A2", p = 0.05,
                 Se = c(0.95, 0.99), Sp = c(0.95, 0.98),
                 group.sz = 3:15, obj.fn = "ET")
summary(results2)

# 1.3) Calculate the operating characteristics using
#   informative two-stage hierarchical (Dorfman) testing,
#   implemented via the pool-specific optimal Dorfman
#   (PSOD) method described in McMahan et al. (2012a).
# Hierarchical testing configurations are specified by
#   a matrix in the hier.config argument. The rows of
#   the matrix correspond to the stages of the
#   hierarchical testing algorithm, the columns
#   correspond to the individuals to be tested, and the
#   cell values correspond to the group number of each
#   individual at each stage.
config.mat <- matrix(data = c(rep(1, 5), rep(2, 4), 3, 1:10),
                     nrow = 2, ncol = 10, byrow = TRUE)
set.seed(8791)
results3 <- opChar1(algorithm = "ID2", p = 0.02,
                    Se = 0.95, Sp = 0.99,
                    hier.config = config.mat, alpha = 0.5)
summary(results3)

# 1.4) Calculate the operating characteristics using
#   non-informative four-stage hierarchical testing.
config.mat <- matrix(data = c(rep(1, 15), rep(c(1, 2, 3), each = 5),
                              rep(1, 3), rep(2, 2),
                              rep(3, 3), rep(4, 2),
                              rep(5, 4), 6, 1:15),
                     nrow = 4, ncol = 15, byrow = TRUE)
results4 <- opChar1(algorithm = "D4", p = 0.008,
                    Se = 0.96, Sp = 0.98,
                    hier.config = config.mat,
                    a = c(1, 4, 6, 9, 11, 15))
summary(results4)


# 2) Identification using hierarchical and array-based
#   group testing algorithms with a multiplex assay that
#   tests for two diseases.

# 2.1) Find the optimal testing configuration using
#   non-informative two-stage hierarchical testing, given
#   p.vec, a vector of overall joint probabilities of disease;
#   Se, a vector of sensitivity values for each disease; and
#   Sp, a vector of specificity values for each disease.
# Se and Sp can also be specified as a matrix, where one
#   value is specified for each disease at each stage of
#   testing.
results5 <- OTC2(algorithm = "D2",
                 p.vec = c(0.90, 0.04, 0.04, 0.02),
                 Se = c(0.99, 0.99), Sp = c(0.99, 0.99),
                 group.sz = 3:20)
summary(results5)

# 2.2) Calculate the operating characteristics for
#   informative five-stage hierarchical testing, given
#   alpha.vec, a vector of shape parameters for the
#   Dirichlet distribution;
#   Se, a matrix of sensitivity values; and
#   Sp, a matrix of specificity values.
Se <- matrix(data = rep(0.95, 10), nrow = 2, ncol = 5, byrow = TRUE)
Sp <- matrix(data = rep(0.99, 10), nrow = 2, ncol = 5, byrow = TRUE)
config.mat <- matrix(data = c(rep(1, 24), rep(1, 18),
                              rep(2, 6), rep(1, 9),
                              rep(2, 9), rep(3, 4), 4, 5,
                              rep(1, 6), rep(2, 3),
                              rep(3, 5), rep(4, 4),
                              rep(5, 3), 6, rep(NA, 2),
                              1:21, rep(NA, 3)),
                     nrow = 5, ncol = 24, byrow = TRUE)
results6 <- opChar2(algorithm = "ID5",
                    alpha = c(18.25, 0.75, 0.75, 0.25),
                    Se = Se, Sp = Sp,
                    hier.config = config.mat)
summary(results6)

# 3) Estimation of the overall disease prevalence and
#   calculation of confidence intervals.

# 3.1) Suppose 3 groups out of 24 test positively.
#   Each group has a size of 7.
propCI(x = 3, m = 7, n = 24, ci.method = "CP")
propCI(x = 3, m = 7, n = 24, ci.method = "Blaker")
propCI(x = 3, m = 7, n = 24, ci.method = "score")
propCI(x = 3, m = 7, n = 24, ci.method = "soc")

# 3.2) Consider the following situation:
#   0 out of 5 groups test positively with groups
#   of size 1 (individual testing),
#   0 out of 5 groups test positively with groups of size 5,
#   1 out of 5 groups test positively with groups of size 10,
#   2 out of 5 groups test positively with groups of size 50
propCI(x = c(0, 0, 1, 2), m = c(1, 5, 10, 50),
       n = c(5, 5, 5, 5), pt.method = "Gart",
       ci.method = "skew-score")

# 4) Estimate a group testing regression model.

# 4.1) Fit a group testing regression model with
#   simple pooling using the "hivsurv" dataset.
data(hivsurv)
fit1 <- gtReg(type = "sp",
              formula = groupres ~ AGE + EDUC.,
              data = hivsurv, groupn = gnum,
              sens = 0.9, spec = 0.9, method = "Xie")
summary(fit1)

# 4.2) Simulate data for the halving protocol, and
#   fit a group testing regression model.
set.seed(46)
gt.data <- gtSim(type = "halving", par = c(-6, 0.1),
                 gshape = 17, gscale = 1.4,
                 size1 = 1000, size2 = 5,
                 sens = 0.95, spec = 0.95)
fit2 <- gtReg(type = "halving", formula = gres ~ x,
              data = gt.data, groupn = groupn,
              subg = subgroup, retest = retest,
              sens = 0.95, spec = 0.95,
              start = c(-6, 0.1), trace = TRUE)
summary(fit2)



binGroup2 documentation built on Nov. 14, 2023, 9:06 a.m.