VI: Compute the Vector Index from pooled/group testing data

vectorIndexR Documentation

Compute the Vector Index from pooled/group testing data

Description

Calculates the Vector Index based on pooled or group testing data containing various different pool sizes and accomodating both pool-based and exogenous sources for population density computations.

Usage

vectorIndex(x, ...)

VI(x, ...)


## S3 method for class 'formula'
vectorIndex(x, data,
 n.use.traps = TRUE, n.use.na = FALSE,
 pt.method = c("firth", "gart", "bc-mle", "mle", "mir"),
 ci.method = c("skew-score", "bc-skew-score", "score", "lrt", "wald", "mir"),
 scale = 1, alpha = 0.05, tol = .Machine$double.eps^0.5, ...)

## Default S3 method:
vectorIndex(x, m, n=rep(1, length(x)), vector, trap.time=rep(1,length(x)), group,
 n.use.traps = TRUE, n.use.na = FALSE,
 pt.method = c("firth", "gart", "bc-mle", "mle", "mir"),
 ci.method = c("skew-score", "bc-skew-score", "score", "lrt", "wald", "mir"),
 scale = 1, alpha = 0.05, tol = .Machine$double.eps^0.5, ...)

## S3 method for class 'formula'
VI(x, data,
 n.use.traps = TRUE, n.use.na = FALSE,
 pt.method = c("firth", "gart", "bc-mle", "mle", "mir"),
 ci.method = c("skew-score", "bc-skew-score", "score", "lrt", "wald", "mir"),
 scale = 1, alpha = 0.05, tol = .Machine$double.eps^0.5, ...)

## Default S3 method:
VI(x, m, n = rep(1, length(x)), vector, trap.time = rep(1,length(x)), group,
 n.use.traps = TRUE, n.use.na = FALSE,
 pt.method = c("firth", "gart", "bc-mle", "mle", "mir"),
 ci.method = c("skew-score", "bc-skew-score", "score", "lrt", "wald", "mir"),
 scale = 1, alpha = 0.05, tol = .Machine$double.eps^0.5, ...)

## S3 method for class 'vectorIndex'
print(x, ...)

## S3 method for class 'vectorIndex'
as.data.frame(x)
## S3 method for class 'VI'
print(x, ...)

## S3 method for class 'VI'
as.data.frame(x)

## S3 method for class 'vectorIndex'
x[i, j, drop = if (missing(i)) TRUE else length(cols) ==  1]
## S3 method for class 'VI'
x[i, j, drop = if (missing(i)) TRUE else length(cols) ==  1]

Arguments

x

If an object of class formula (or one that can be coerced to that class): a symbolic representation of the variables used to identify the number of positive pools, corresponding pool size, and number of pools of corresponding pool size, along with an optional grouping variable and required indication of the variable identifying the vector subgroups. A variable (trap.time) may be included for collection effort measured in collection time units for traps (e.g., mosquito trapping efforts as 'trap-nights') differ; the default is 1 for each record. See the 'Details' section below. Otherwise a vector, specifying the observed number of positive pools, among the number of pools tested (n). For methods, objects of class vectorIndex/VI.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables specified in the formula.

n.use.traps

logical value to indicate whether to use only the pool (m) and sample (n) sizes for records with missing numbers of positives (x), so including those with positivity results (x), when computing the measure of poplation size. Default is TRUE.

n.use.na

logical value to indicate whether the pool (m) and sample (n) sizes for records with missing numbers of positives (x) should be included (TRUE) or not (FALSE) when computing the measure of poplation size. Default is FALSE.

m

a vector of pool sizes, must have the same length as x

n

a vector of the corresponding number of pools of sizes m

vector

a vector indicating the "vector" to which the pool or count belongs

trap.time

a numeric vector recording the time-based collection effort (with standard time unit, e.g., "trap nights"); the defafult is 1 for each record

group

a vector of group identifiers corresponding to the number of positive pools x and pool size m

pt.method

a character string, specifying the point estimate to compute, with the following options: "firth", bias-corrected maximum likelihood estimate (MLE) using Firth's correction (the default);"gart" and "bc-mle", the bias-corrected MLE; "mle", MLE; and "mir", minimum infection rate (MIR).

ci.method

a character string, specifying the confidence interval to compute, with options: "skew-score", skewness-corrected score interval (the default); "score" the score interval; "bc-skew-score" bias- and skewness-corrected score interval; "lrt" likelihood ratio test interval; "wald" Wald interval; and "mir" Wald binomial interval based on the MIR.

scale

a single numeric, coefficient to scale the point estimates and intervals bounds in the print and summary method (print.pooledBin, summary.pooledBin)

alpha

a single numeric, specifying the type-I-error level; confidence level is 100(1-alpha)%

tol

accuracy required for iterations in internal functions

i,j

elements to extract or replace. For [ and [[, these are numeric or character or, for [ only, empty or logical. Numeric values are coerced to integer as if by as.integer.

drop

logical. If TRUE the result is coerced to the lowest possible dimension. The default is to drop if only one column is left, but not to drop if only one row is left.

...

future arguments

Details

The vector index (VI) is a measure used to quantify the infectious burden in a population, such as the West Nile virus burden in populations of Culex vector mosquito species. The language here is adopted from that for mosquito collections and testing for viral pathogens. The VI is computed as the sum over vector species (i) of the average number of indivdiuals collected per collection effort ('trap night') \overline{N}_i times an estimate of the infection prevalence or 'rate' \widehat{p}_i, generally computed using pooled samples. It is possible that not all indivdiuals are tested in a collection when computing the VI, so that the data set recording the numbers of individuals collected may be different or larger than that used in estimation of p. To allow for a single data set to be used in computation of the VI, we permit data configuration and function parameter specifications (n.use.na and n.no.traps) to allow different data subsets to be used in computation of the components \overline{N}_i and \widehat{p}_i.

formula specification: the formula is an extension of that used in the pooledBin/pIR functions; see the 'Details' section in their documentation. The extension in the formula specification to identify the variable representing the vector identifying variable is to include the variable after a 'forward slash' character. Letting the variable V represent the vector variable, such as specification is then generically X ~ m(M) + n(N) / V when there is no grouping variable, and X ~ m(M) + n(N) | G / V when there is a grouping variable G, and multiple grouping variables are allowed, as with pooledBin/pIR.

Note that the vector variable V must come after the complete formula specification for the underlying pooledBin/pIR call, which includes, as needed, the grouping variable G.

A further extension to include differences in collection effort underlying the pools (e.g., the number of 'trap nights' a trap is set for mosquito trapping and all individuals in the pool are from traps set for the same amount of time), typically expressed in time of collection. Let T represent this time variable. Then this is included in the formula interface after a colon (:) separator after the 'vector' variable, so extending the expression in the above paragraph: X ~ m(M) + n(N) / V:T and X ~ m(M) + n(N) | G / V:T in the absence or presence of a grouping variable G, respectively. By default, if such T is not specified, all collection efforts are set to the default unit 1.

Point estimate refers to underlying probability (infection rate) parameters used in the computation of the Vector Index, and they may be viewed in using a call to summary. For point estimation of the probabilties (infection rates), the same options available for pooledBin/pIR are available here: bias preventative ("firth") and bias corrected ("gart") estimators are recommended, with details described in Hepworth G, Biggerstaff BJ (2017). Use of MLE ("mle"), and MIR ("mir") estimators is not recommended, but their computation is provided for historical reasons.

Note that confidence intervals are not reported at this time for the Vector Index (even though there is an option for the method to use), as a direct population measure for inference is unclear; theoretical work on confidence intervals in this context therefore remains undeveloped. Use of a confidence interval to accompany the Vector Index to aid characterization of uncertainty rather than for formal population inference may be justified, and inclusion of such in future releases of the PooledInfRate package is under consideration.

Value

A object of class 'vectorIndex' ('VI') or, if more than one group, of class 'vectorIndexList' ('VIList'). These have a list structure with elements

p

the estimated proportion

lcl

the lower confidence limit

ucl

the upper confidence limit

pt.method

the method used for point estimation

ci.method

the method used for interval estimation

alpha

the type-I-error level

x

the numbers of postive pools

m

the size of the pools

n

the numbers of pools with corresponding pool sizes m

scale

Scaling coefficient for the output

along with attributes class, vector, vectors.var, traptime.var, p, n, n.use.na, n.use.traps, pt.method, ci.method, and call.

Author(s)

Brad Biggerstaff

References

Walter SD, Hildreth SW, Beaty BJ: Estimation of infection rates in population of organisms using pools of variable size. Am J Epidemiol 1980, 112(1):124-128

Hepworth G: Estimation of proportions by group testing. PhD Dissertation. Melbourne, Australia: The University of Melbourne; 1999.

Biggerstaff BJ (2008): Confidence interval for the difference of proportions estmimated from pooled samples. Journal of Agricultural Biological and Environmental Statistics 2008, 13(4):478-496.

Hepworth G, Biggerstaff BJ: Bias correction in estimating Proportions by pooled testing. JABES 2017, to appear.

Examples


#######################################################################
# Consider an imaginary example for a single vector species, where pools of size
# 1, 5, 10 and 50 are tested, 5 pools of each size
# among the 5 pools with size 1 and 5, no pool is positive,
# while among the 5 pools of size 10 and 50, 1 and 2 positive
# pools are identified, respectively.
#######################################################################
# For another vector species, tested with the same design, one might find:
# a pool of size 1 individual is negative,
# 1 positive result among the pools pooling 5 elements,
# no positive result among the pools pooling 10 elements,
# 4 positive results among the pools pooling 50 elements,
#######################################################################

x1 <- c(0,0,1,2)
m1 <- c(1,5,10,50)
n1 <- c(5,5,5,5)

x2<-c(0,1,0,4)
m2 <- c(1,5,10,50)
n2 <- c(5,5,5,5)

ex.dat <- data.frame(NumPos = c(x1,x2),
                     PoolSize = c(m1,m2),
                     NumPools = c(n1,n2),
                     Species = rep(1:2, each=4),
                     TrapNights = c(2,1,1,2,2,3,2,1))

# the Vector Index is thus computed using

vectorIndex(NumPos ~ PoolSize + n(NumPools) / Species:TrapNights , data = ex.dat)

ex2.dat <- rbind(ex.dat,ex.dat)
ex2.dat$Group <- rep(LETTERS[1:2],each=8)
ex2.dat$NumPos[16] <- 0 # just to make them different

vectorIndex(NumPos ~ PoolSize + n(NumPools) |Group / Species:TrapNights , data = ex2.dat)
summary(vectorIndex(NumPos ~ PoolSize + n(NumPools) |Group / Species:TrapNights , data = ex2.dat))



bjbiggerstaff/PooledInfRate documentation built on Jan. 19, 2024, 6:54 p.m.