PooledInfRate: Estimation for Pooled or Group Testing

ipooledBin

R Documentation

Estimates for a proportion from pooled/group testing using an imperfect test

Description

Calculates point estimates for a single proportion based on pooled or group testing data with equal or different pool sizes using an imperfect test.

Usage

ipooledBin(x, ...)

## S3 method for class 'formula'
ipooledBin(x, data, pt.method = c("firth","mle"),
sens = 1, spec = 1,
scale = 1, tol=.Machine$double.eps^0.5, max.iter=10000, p.start=NULL, ...)

## Default S3 method:
ipooledBin(x,m,n=rep(1,length(x)), group, pt.method = c("firth","mle"),
sens=rep(1,length(x)), spec=rep(1,length(x)), scale=1, tol=.Machine$double.eps^0.5,
max.iter=10000, p.start=NULL, ...)

## S3 method for class 'ipooledBin'
print(x, ...)
## S3 method for class 'ipooledBin'
as.data.frame(x)
## S3 method for class 'ipooledBin'
x[i, j, drop = if (missing(i)) TRUE else length(cols) ==  1]

Arguments

`x`	If an object of class formula (or one that can be coerced to that class): a symbolic representation of the variables used to identify the number of positive pools, corresponding pool size, and number of pools of corresponding pool size, along with an optoinal grouping variable. See the 'Details' section below. Otherwise a vector, specifying the observed number of positive pools, among the number of pools tested (`n`). Missing data are omitted For methods, objects of class `ipooledBin`.
`data`	an optional data frame, list or environment (or object coercible by `as.data.frame` to a data frame) containing the variables specified in the formula. Records with mising data are omitted.
`m`	a vector of pool sizes, must have the same length as `x`. Missing data are omitted.
`n`	a vector of the corresponding number of pools of sizes `m`. Missing data are omitted.
`group`	a vector of group identifiers corresponding to the number of positive pools `x` and pool size `m`. Missing data are omitted.
`sens`	a vector of test sensitivities, either a single value applied to each pool, or a vector the length of x to reflect the test sensitivity for the corresponding pool. By default set to 1.
`spec`	a vector of test specificities, either a single value applied to each pool, or a vector the length of x to reflect the test specificity for the corresponding pool. By default set to 1.
`pt.method`	a character string, specifying the point estimate to compute, with the following options: `"firth"`, `"mle"`, MLE.
`scale`	a single numeric, coefficient to scale the point estimates and intervals bounds in the print and summary method (`print.pooledBin`, `summary.pooledBin`)
`max.iter`	maximum number of iterations for the Newton-Raphson algorithm before halting execution
`p.start`	starting value for the iterations for point estimation using Firth's correction. A default is computed of NULL.
`tol`	accuracy required for iterations in internal functions
`i,j`	elements to extract or replace. For [ and [[, these are numeric or character or, for [ only, empty or logical. Numeric values are coerced to integer as if by as.integer.
`drop`	logical. If TRUE the result is coerced to the lowest possible dimension. The default is to drop if only one column is left, but not to drop if only one row is left.
`...`	future arguments

Details

The model for pooled binary data with a (possibly) imperfect test with sensitivity a_i and specificity b_i is (notation follows Hepworth and Biggerstaff (2021)): Assume there are independent binary (0/1, negative/positive) observations Y_j, j = 1, 2, \dots, N, positive with probability p, and that they are pooled/grouped into batches/pools of possbily varying sizes, m_i, with n_i pools of size m_i. The observed values X_i are binary results of some test, assay, or other ascertainment of positivity of the pool. For i = 1, 2, \dots, d, the number of positive pools X_i of size m_i is distributed X_i ~ Binomial(n_i, a_i - r_i(1-p)^{m_i}), with r = a_i + b_i - 1 and with n_i pools of size m_i, independently. The number of individuals (having the unobserved Y_j values) is N = \sum_{i=1}^d m_i n_i.

formula specification: The specification of the formula interface matches pooledBin. The basic structure of the formula interface echoes the model formula structure used in standard R model functions like lm and glm: 'number of positive pools' ~ 'pool size'. For commonly used binary (0/1) variable X = 'number of positive pools' for pools of sizes M = 'pool size', the formula is X ~ M. As a generalization, X = 'number of positive pools' may be a number > 1 representing the number of positive pools of N = 'number of pools of size M' (so 0 \le X \le N). The formula representation now requires identification of both the pool size variable (M) and the number of pools variable (N). This is done using functional notation in the formula, using m() and n() to identify the variables for 'pool size' and 'number of pools', respectively, so that the basic formula is extended to X ~ m(M) + n(N). Because the pool size variable identified by m() is required for use of these functions to make sense, specificaiton by m() is optional to avoid the annoyance of having to type m() for each call; examples are given below. Note that if the 'number of pools' variable is needed, use of n() to identify this variable is required. The final extension for the formula is to indicate a grouping variable, so that estiamtes are produced for each group separately. This is indicated in the formula using a 'conditioning' indicator '|' separating the part of the formula above from the grouping variable, say Group. The resulting formula specification is X ~ m(M) + n(N) | Group, and multiple grouping variables may be specified using a *, as X ~ m(M) + n(N) | Group1 * Group2. Since the m() indication is optional, the following, like-identified forms (a–d) are equivalent formula specifications:

a)	`X`	`~`	`m(M)`
a)	`X`	`~`	`M`


b)	`X`	`~`	`m(M) + n(N)`
b)	`X`	`~`	`n(N) + m(M))`
b)	`X`	`~`	`M + n(N)`
b)	`X`	`~`	`n(N) + M`


c)	`X`	`~`	`m(M) \| Group`
c)	`X`	`~`	`M \| Group`


d)	`X`	`~`	`m(M) + n(N) \| Group`
d)	`X`	`~`	`M + n(N) \| Group`
d)	`X`	`~`	`n(N) + m(M) \| Group`
d)	`X`	`~`	`n(N) + M \| Group`

e)	`X`	`~`	`m(M) \| Group1 * Group2`
e)	`X`	`~`	`M \| Group1*Group2`

Calls with formula with other than the detailed formula structures, e.g., with more than two variables in the RHS, will result in error or incorrect results. Similarly, numerical input (such as c(1,0,0,0) ~ c(50,25,10,5)) in the formula will result in an error; such input will work directly with a comma instead of a ~ symbol as pooledBin(c(1,0,0,0), c(50,25,10,5)) via an internal call to pooledBin.default. Finally, we just note that the use of symbols 'm' and 'n' for m() and n() comes from the mathematical development in the key references.

Point estimation: the bias preventative ("firth") estimate is recommended, with details described in Hepworth G, Biggerstaff BJ (2017). Use of MLE ("mle") estimate is not recommended, but its computation is provided for users who need it.

Note that not all sensitivity (sens) and specificity (spec) values are compatible with all data (x, m, n) configiurations. If convergence issues arise, consider revising sens or spec; see Hepworth and Biggerstaff (2021).

No confidence intervals are computed or reported for imperfect tests, as clear recommendations await further research.

The subsetting or extractor functions [.pooledBin and [.pIR mimic the same [ behavior as with data frames.

Value

A object of class 'ipooledBin' or, if more than one group, of class 'ipooledBinList'. These have a list structure with elements

`p`	the estimated proportion(s)

with attributes class, group.names (List), group.var (List), x, m, n, sens, spec, scale, pt.method, and call.

Author(s)

Brad Biggerstaff

References

Hepworth G, Biggerstaff BJ. Bias correction in estimating proportions by pooled testing. Journal of Agricultural Biological and Environmental Statistics, 22(4):602-614, 2017.

Hepworth G, Biggerstaff BJ. Bias correction in estimating proportions by imperfect pooled testing. Journal of Agricultural Biological and Environmental Statistics, 26(1):90-104, 2021.

Examples


# Consider an imaginary example, where pools of size
# 1, 5, 10 and 50 are tested, 5 pools of each size.
# Among the 5 pools with sizes 1 and 5, no pool is positive,
# while among the 5 pools of sizes 10 and 50, 1 and 2 positive
# pools are identified, respectively.

x1 <- c(0,0,1,2)
m1 <- c(1,5,10,50)
n1 <- c(5,5,5,5)

ipooledBin(x=x1, m=m1, n=n1, sens=0.95, spec=0.99)
ipooledBin(x=x1, m=m1, n=n1, sens=0.95, spec=0.99, scale = 1000)

summary(ipooledBin(x=x1, m=m1, n=n1, sens=0.95, spec=0.99, scale=1000))

### to use the formula interface, store the data in a data frame
ex.dat <- data.frame(NumPos = x1, PoolSize = m1, NumPools = n1)

ipooledBin(NumPos ~ PoolSize + n(NumPools), data = ex.dat, sens=0.95, spec=0.99)

# without the NumPools variable, just as an example
ipooledBin(NumPos ~ m(PoolSize), data = subset(ex.dat,NumPos<2), sens=0.95, spec=0.99)
summary(ipooledBin(NumPos ~ PoolSize, data = subset(ex.dat,NumPos<2), sens=0.95, spec=0.99))

bjbiggerstaff/PooledInfRate documentation built on Jan. 19, 2024, 6:54 p.m.