mle.prop.eff: Efficiency of the Proportion Estimator Calculated from Group...
In groupTesting: Simulating and Modeling Group (Pooled) Testing Data

View source: R/prop_eff_main.R

mle.prop.eff

R Documentation

Efficiency of the Proportion Estimator Calculated from Group Testing Data

Description

This function provides relative efficiency results for the maximum likelihood estimator of disease prevalence (proportion), p. The estimator, \hat{p}, is calculated using test responses from commonly used group testing protocols. The relative efficiency values measure how effective group testing is for estimating the prevalence when compared to individual testing. The function also calculates expected testing and estimation costs. Please refer to Warasi and Das (2024), where the statistical methods were introduced.

Usage

mle.prop.eff(
  p,
  Se,
  Sp,
  initial.psz,
  protocol = c("MPT", "H2", "H3", "H4", "A2", "A2M"),
  criterion = c("RTE", "REE", "RCE"),
  N = 800,
  ngit = 3000,
  maxit = 200,
  tol = 0.001,
  nrep = 3000,
  seed = NULL,
  ncore = 1
)

Arguments

`p`	Proportion of individuals truly positive for a disease (i.e., disease prevalence).
`Se`	Assay sensitivity, a scalar value.
`Sp`	Assay specificity, a scalar value.
`initial.psz`	A vector of initial pool sizes.
`protocol`	One of the six group testing protocols, where 'MPT' is the default.
`criterion`	One of the three optimization criteria, where 'RTE' is the default.
`N`	Number of individuals to be tested using group testing (i.e., sample size).
`ngit`	Number of Gibbs iterates used in the EM algorithm.
`maxit`	Maximum number of iterates in the EM algorithm.
`tol`	Convergence tolerance for the EM algorithm.
`nrep`	Number of repetitions used in the proposed computation algorithm.
`seed`	A single seed value, an integer.
`ncore`	Number of CPU cores to be used in computing, where ncore => 1.

Details

The function mle.prop.eff computes three measures of efficiency: relative testing efficiency (RTE), relative estimation efficiency (REE), and relative cost efficiency (RCE). These measures can be calculated for six common group testing protocols: master pool testing (MPT), hierarchical testing with two, three, and four stages (H2, H3, and H4), and array testing without and with master pool testing (A2 and A2M). For more information on these protocols, refer to Kim et al. (2007). We use the term 'relative efficiency' because these measures compare group testing (numerator) with the usual one-at-a-time, i.e., individual testing (denominator).

In the paper, we defined 'RTE' and discussed how it can be calculated for both common and more complex group testing protocols. For the five multistage protocols (H2, H3, H4, A2, and A2M), our function provides RTE values based on the analytic expressions of Kim et al. (2007). These expressions have been coded by Hitt et al. (2023) in their R package 'binGroup2'. We developed R code to restructure the output obtained from binGroup2, so it is consistent with our proposed method.

Based on the expressions in our article, we analytically calculate REE for MPT and H2 and also compute RCE for MPT. For other scenarios, we determine REE and RCE based on our proposed computation algorithm.

The expected costs, E[T], E[(\hat{p} - p)^2], and E[T(\hat{p} - p)^2], are calculated for a given N, the number of individuals to be tested. The 'MPT' and 'H2' protocols require that N is completely divisible by the initial pool size k. If this is not the case, the integer that is closest to N and divisible by k will be used. It is worth noting that N is also used in the computation algorithm, where a large-sample assumption is made. We found that N = 800 may be sufficient in most scenarios for the validity of this assumption, although we used N = 1200 in the article; for more information, refer to Warasi and Das (2024).

Arguments 'ngit', 'maxit', and 'tol' are used in the EM algorithm. We found that 'ngit = 3000' Gibbs iterates are generally sufficient, but using a bigger 'ngit' may be more reliable. The default choices for 'maxit' and 'tol' are also reasonable.

Argument 'nrep' is necessary for the validity of the law of large numbers in the computation algorithm. While a larger choice of 'nrep' is generally preferable, our paper suggests that 5000 repetitions may be sufficient in most scenarios. Note that 'nrep = 3000' or so may also provide reasonable approximations.

Execution of 'RCE' for multistage protocols (H2, H3, H4, A2, and A2M) is quite slow because it uses Gibbs samplers in the computation algorithm. For the same reason, execution of 'REE' for H3, H4, A2, and A2M is also slow. To overcome this limitation, we included the ncore argument, which enables users to perform the computing tasks using parallel CPUs through the parallel package. The program works with ncore = 1 or with any larger choice of ncore.

Specifying a 'seed' ensures reproducibility of the results produced by our function, whether one uses a single core or multiple cores. Note that the seed value is used only when the computation algorithm is implemented.

Value

A list with three matrix objects labeled as "efficiency", "expected_cost", "iterations":

"efficiency": A matrix that reports the relative efficiency results: RTE, REE, and RCE.
"expected_cost": A matrix reporting the expected costs: E[T], E[(\hat{p} - p)^2], and E[T(\hat{p} - p)^2], corresponding to the criteria RTE, REE, and RCE, respectively.
"iterations": A matrix that provides the cumulative running average of the estimates (sample means) of E[(\hat{p} - p)^2] and E[T(\hat{p} - p)^2] when the computation algorithm is used. These running estimates can be useful for verification of the 'law of large numbers', as demonstrated in the Web Appendix of the article Warasi and Das (2024). In each row of the matrix, the first 1, 2, or 3 columns are for pool size(s) and the remaining 'nrep' columns provide the running averages; see the examples.

These results are provided for all possible pooling configurations within the specified range of initial pool sizes. The stage-specific pool sizes are also included in the results.

References

Kim HY, Hudgens M, Dreyfuss J, Westreich D, and Pilcher C. (2007). Comparison of Group Testing Algorithms for Case Identification in the Presence of Testing Error. Biometrics, 63:1152-1163.
Zhang W, Liu A, Li Q, Albert P. (2020). Incorporating Retesting Outcomes for Estimation of Disease Prevalence. Statistics in Medicine, 39:687-697.
Warasi and Das (2024). Optimizing Disease Surveillance Through Pooled Testing with Application to Infectious Diseases. Journal of Agricultural, Biological and Environmental Statistics. In press.

Examples


library(groupTesting)

## Gonorrhea data information:
p0 <- 0.041       # True prevalence  
Se <- 0.913       # Assay sensitivity
Sp <- 0.993       # Assay specificity
psz <- 2:6        # A range of initial pool sizes
 
 
## Example 1: Two-stage hierarchical testing (H2)  

## REE (using closed-form expressions)
res <- mle.prop.eff(p=p0, Se=Se, Sp=Sp, initial.psz=psz, protocol="H2", criterion="REE")

## Output

# > res
# $efficiency
#      PSZ.S1 PSZ.S2    REE
# [1,]      2      1 0.8920
# [2,]      3      1 0.8932
# [3,]      4      1 0.8975
# [4,]      5      1 0.9034
# [5,]      6      1 0.9104

# $expected_cost
# PSZ.S1 PSZ.S2 E[(phat-p)^2]
# [1,]      2      1  5.731668e-05
# [2,]      3      1  5.732655e-05
# [3,]      4      1  5.767260e-05
# [4,]      5      1  5.805423e-05
# [5,]      6      1  5.864758e-05

# $iterations
# NULL


## RTE (using closed-form expressions)
mle.prop.eff(p=p0, Se=Se, Sp=Sp, initial.psz=psz, protocol="H2", criterion="RTE")

## RCE (using the computation algorithm)
# res <- mle.prop.eff(p=p0, Se=Se, Sp=Sp, initial.psz=psz, protocol="H2", seed=123, 
#   criterion="RCE", N=800, ngit=3000, maxit=200, tol=0.001, nrep=3000, ncore=4)

# Note: For 'H2' protocol, get cumulative running averages as:
# res2 <- res$iterations[ ,-(1:2)] 
# Now, plot each row; for example: plot(res2[1, ], type='l')
#
# Execution of 'RCE' for H2 is slow, as discussed in the 'details' section.
# The computing challenge can be overcome using multiple CPU cores as shown above. 


## Example 2: Three-stage hierarchical testing (H3)

## REE (using the computation algorithm)
# res <- mle.prop.eff(p=p0, Se=Se, Sp=Sp, initial.psz=psz, protocol="H3", seed=123, 
#   criterion="REE", N=800, ngit=3000, maxit=200, tol=0.001, nrep=3000, ncore=4)
#
# Note: For 'H3' protocol, get cumulative running averages as:
# res2 <- res$iterations[ ,-(1:3)]  # Now, plot each row: plot(res2[1, ], type='l')

## RTE (using closed-form expressions)
# res <- mle.prop.eff(p=p0, Se=Se, Sp=Sp, initial.psz=psz, protocol="H3",  
#   criterion="RTE", N=800, ngit=3000, maxit=200, tol=0.001, nrep=3000, ncore=4)

## RCE (using the computation algorithm)
# res <- mle.prop.eff(p=p0, Se=Se, Sp=Sp, initial.psz=psz, protocol="H3", seed=123, 
#   criterion="RCE", N=800, ngit=3000, maxit=200, tol=0.001, nrep=3000, ncore=4)

groupTesting documentation built on Sept. 12, 2024, 7:32 a.m.