pooledBin | R Documentation |
Calculates point estimates and confidence intervals for a single proportion based on pooled or group testing data with equal or different pool sizes.
pooledBin(x, ...)
pIR(x, ...)
## S3 method for class 'formula'
pooledBin(x, data,
pt.method = c("firth", "gart", "bc-mle", "mle", "mir"),
ci.method = c("skew-score", "bc-skew-score", "score", "lrt", "wald", "mir"),
scale = 1, alpha = 0.05, tol = .Machine$double.eps^0.5, ...)
## Default S3 method:
pooledBin(x, m, n = rep(1, length(x)), group,
pt.method = c("firth", "gart", "bc-mle", "mle", "mir"),
ci.method = c("skew-score", "bc-skew-score", "score", "lrt", "wald", "mir"),
scale = 1, alpha = 0.05, tol = .Machine$double.eps^0.5, ...)
## S3 method for class 'formula'
pIR(x, data,
pt.method = c("firth", "gart", "bc-mle", "mle", "mir"),
ci.method = c("skew-score", "bc-skew-score", "score", "lrt", "wald", "mir"),
scale = 1, alpha = 0.05, tol = .Machine$double.eps^0.5, ...)
## Default S3 method:
pIR(x, m, n = rep(1, length(x)), group,
pt.method = c("firth", "gart", "bc-mle", "mle", "mir"),
ci.method = c("skew-score", "bc-skew-score", "score", "lrt", "wald", "mir"),
scale = 1, alpha = 0.05, tol = .Machine$double.eps^0.5, ...)
## S3 method for class 'pooledBin'
print(x, ...)
## S3 method for class 'pooledBin'
plot(x, pch=16, refline=TRUE, printR2=TRUE,layout=NULL,...)
## S3 method for class 'pooledBin'
as.data.frame(x)
## S3 method for class 'pooledBin'
x[i, j, drop = if (missing(i)) TRUE else length(cols) == 1]
## S3 method for class 'pIR'
print(x, ...)
## S3 method for class 'pIR'
plot(x, pch=16, refline=TRUE, printR2=TRUE,layout=NULL,...)
## S3 method for class 'pIR'
as.data.frame(x)
## S3 method for class 'pIR'
x[i, j, drop = if (missing(i)) TRUE else length(cols) == 1]
x |
If an object of class formula (or one that can be coerced to that class): a symbolic representation of the variables used to identify the number of positive pools, corresponding pool size, and number of pools of corresponding pool size, along with an optional grouping variable. See the 'Details' section below. Otherwise a vector, specifying the observed number of positive pools, among the number of pools tested ( |
data |
an optional data frame, list or environment (or object coercible by |
m |
a vector of pool sizes, must have the same length as |
n |
a vector of the corresponding number of pools of sizes |
group |
a vector of group identifiers corresponding to the number of positive pools |
pt.method |
a character string, specifying the point estimate to compute, with the following options: |
ci.method |
a character string, specifying the confidence interval to compute, with options: |
scale |
a single numeric, coefficient to scale the point estimates and intervals bounds in the print and summary method ( |
alpha |
a single numeric, specifying the type-I-error level; confidence level is 100(1-alpha)% |
tol |
accuracy required for iterations in internal functions |
... |
future arguments |
pch |
lotting character |
refline |
logical valute to indicate whether to inlcude the reference line for graphic interpretation |
printR2 |
include printing of the |
layout |
2-element, |
i,j |
elements to extract or replace. For [ and [[, these are numeric or character or, for [ only, empty or logical. Numeric values are coerced to integer as if by as.integer. |
drop |
logical. If TRUE the result is coerced to the lowest possible dimension. The default is to drop if only one column is left, but not to drop if only one row is left. |
The model for pooled binary data is: Assume there are independent binary (0/1, negative/positive
) observations Y_j
, j = 1, 2, \dots, N
, positive with probability p
, and that they are pooled/grouped into batches/pools of possbily varying sizes, m_i
, with n_i
pools of size m_i
. The observed values X_i
are binary results of some test, assay, or other ascertainment of positivity of the pool. For i = 1, 2, \dots, d
, the number of positive pools X_i
of size m_i
is distributed X_i ~ Binomial(n_i, 1-(1-p)^{m_i})
, with n_i
pools of size m_i
, independently. The number of individuals (having the unobserved Y_j
values) is N
= \sum_{i=1}^d m_i n_i
.
formula
specification: The basic structure of the formula interface echoes the model formula structure used in standard R model functions like lm
and glm
: 'number of positive pools' ~ 'pool size'. For commonly used binary (0/1) variable X
= 'number of positive pools' for pools of sizes M
= 'pool size', the formula is X ~ M
. As a generalization, X
= 'number of positive pools' may be a number > 1 representing the number of positive pools of N
= 'number of pools of size M
' (so 0 \le
X
\le
N
). The formula representation now requires identification of both the pool size variable (M
) and the number of pools variable (N
). This is done using functional notation in the formula, using m()
and n()
to identify the variables for 'pool size' and 'number of pools', respectively, so that the basic formula is extended to X ~ m(M) + n(N)
. Because the pool size variable identified by m()
is required for use of these functions to make sense, specificaiton by m()
is optional to avoid the annoyance of having to type m()
for each call; examples are given below. Note that if the 'number of pools' variable is needed, use of n()
to identify this variable is required. The final extension for the formula is to indicate a grouping variable, so that estiamtes are produced for each group separately. This is indicated in the formula using a 'conditioning' indicator '|
' separating the part of the formula above from the grouping variable(s), Group
. The resulting basic formula specification is X ~ m(M) + n(N) | Group
, and multiple grouping variables may be specified (as of package verison 1.2) and indicated by separating them with a *
, as X ~ m(M) + n(N) | Group1 * Group2
. Since the m()
indication is optional, the following, like-identified forms (a–d) are equivalent formula specifications:
a) | X | ~ | m(M) |
a) | X | ~ | M |
b) | X | ~ | m(M) + n(N) |
b) | X | ~ | n(N) + m(M)) |
b) | X | ~ | M + n(N) |
b) | X | ~ | n(N) + M |
c) | X | ~ | m(M) | Group |
c) | X | ~ | M | Group |
d) | X | ~ | m(M) + n(N) | Group |
d) | X | ~ | M + n(N) | Group |
d) | X | ~ | n(N) + m(M) | Group |
d) | X | ~ | n(N) + M | Group
|
e) | X | ~ | m(M) | Group1 * Group2 |
e) | X | ~ | M | Group1*Group2 |
Calls with formula
with other than the detailed formula structures, e.g., with more than two variables in the RHS, will result in error or incorrect results. Similarly, numerical input (such as c(1,0,0,0) ~ c(50,25,10,5)
) in the formula will result in an error; such input will work directly with a comma instead of a ~
symbol as pooledBin(c(1,0,0,0), c(50,25,10,5))
via an internal call to pooledBin.default
. Finally, we just note that the use of symbols 'm' and 'n' for m()
and n()
comes from the mathematical development in the key references.
Point estimation: bias preventative ("firth") and bias corrected ("gart") estimators are recommended, with details described in Hepworth G, Biggerstaff BJ (2017). Use of MLE ("mle"
), and MIR ("mir"
) estimators are not recommended, but their computation is provided for historical reasons.
Confidence intervals: All confidence intervals are based on asymptotic methods. Score-based intervals are recommended, with the skewness-corrected score interval showing geenerally best performance and so is the default (see Hepworth (2005) or Biggerstaff (2008)). The likelihood ratio test-based confidence interval is also included, as are the Wald interval and an interval based in the MIR. Note that the methods "mir" and "wald" cannot be recommended, because they return too narrow intervals in relevant situations, "mir" because it ignores the pooling, and "wald" because it relies on simple large sample methods. For computational details and simulation results of the methods, see Hepworth (2005) and Biggerstaff (2008). Note that for the score interval, when all pools are negative using a perfect test, each individual is negative, so that we set (beginning with package version 1.5) the lower confidence limit to 0 and the upper confidence limit to the standard, unpooled Wilson score limit; using the expected information in this case for the score interval computation includes pooling that does not acknowledge the detailed, indivdiual-level test result knowledge available in this specific case.
The subsetting or extractor functions [.pooledBin
and [.pIR
mimic the same [
behavior as with data frames.
A object of class 'pooledBin' ('pIR'). These have a list structure with elements
p |
the estimated proportion |
lcl |
the lower confidence limit |
ucl |
the upper confidence limit |
pt.method |
the method used for point estimation |
ci.method |
the method used for interval estimation |
alpha |
the type-I-error level |
x |
the numbers of postive pools |
m |
the size of the pools |
n |
the numbers of pools with corresponding pool sizes m |
scale |
Scaling coefficient for the output |
and attributes class
, group.names
, group.var
, and class
.
Brad Biggerstaff
Walter SD, Hildreth SW, Beaty BJ. Estimation of infection rates in population of organisms using pools of variable size. American Journal of Epidemiology, 112(1):124-128, 1980.
Hepworth G: Estimation of proportions by group testing. PhD Dissertation. Melbourne, Australia: The University of Melbourne; 1999.
Hepworth G. Confidence intervals for proportions estimated by group testing with groups of unequal size, Journal of Agricultural Biological and Environmental Statistics, 10(4):478-497, 2005.
Biggerstaff BJ. Confidence interval for the difference of proportions estmimated from pooled samples. Journal of Agricultural Biological and Environmental Statistics, 13(4):478-496, 2008.
Hepworth G, Biggerstaff BJ. Bias correction in estimating proportions by pooled testing. Journal of Agricultural Biological and Environmental Statistics, 22(4):602-614, 2017.
# Consider an imaginary example, where pools of size
# 1, 5, 10 and 50 are tested, 5 pools of each size.
# Among the 5 pools with sizes 1 and 5, no pool is positive,
# while among the 5 pools of sizes 10 and 50, 1 and 2 positive
# pools are identified, respectively.
x1 <- c(0,0,1,2)
m1 <- c(1,5,10,50)
n1 <- c(5,5,5,5)
pooledBin(x=x1, m=m1, n=n1)
pooledBin(x=x1, m=m1, n=n1, scale=1000)
pooledBin(x=x1, m=m1, n=n1)
summary(pooledBin(x=x1, m=m1, n=n1), scale=1000)
### to use the formula interface, store the data in a data frame
ex.dat <- data.frame(NumPos = x1, PoolSize = m1, NumPools = n1)
pooledBin(NumPos ~ PoolSize + n(NumPools), data = ex.dat)
# without the NumPools variable, just as an example
pooledBin(NumPos ~ m(PoolSize), data = subset(ex.dat,NumPos<2))
pooledBin(NumPos ~ PoolSize, data = subset(ex.dat,NumPos<2))
#######################################################################
# For another population, tested with the same design, one might find:
# 1 positive result among the pools pooling 5 elements,
# no positive result among the pools pooling 10 elements,
# 4 positive results among the pools pooling 50 elements,
#######################################################################
x2<-c(0,1,0,4)
m2 <- c(1,5,10,50)
n2 <- c(5,5,5,5)
pooledBin(x=x2, m=m2, n=n2)
# Some other methods for the confidence bounds:
pooledBin(x=x2, m=m2, n=n2, ci.method="lrt")
### These two populations may be analyzed in one call if they are included in a single data frame
ex2.dat <- data.frame(NumPos = c(x1,x2),
PoolSize = c(m1,m2),
NumPools = c(n1,n2),
Population = rep(1:2, each=4))
pooledBin(NumPos ~ PoolSize + n(NumPools) | Population, data = ex2.dat)
####################################################
# Reproducing some of the estimates from Table 1 in
# Hepworth & Biggerstaff (2017):
####################################################
pooledBin(x=c(1,2), m=c(20,5), n=c(8,8), pt.method="firth", ci.method="lrt")
pooledBin(x=c(7,8), m=c(20,5), n=c(8,8), pt.method="firth", ci.method="lrt")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.