PoolPrev: Estimation of prevalence based on presence/absence tests on...

View source: R/PoolPrev.R

PoolPrevR Documentation

Estimation of prevalence based on presence/absence tests on pooled samples

Description

Estimation of prevalence based on presence/absence tests on pooled samples

Usage

PoolPrev(
  data,
  result,
  poolSize,
  ...,
  bayesian = TRUE,
  prior = NULL,
  robust = TRUE,
  level = 0.95,
  all.negative.pools = "zero",
  reproduce.poolscreen = FALSE,
  verbose = FALSE,
  cores = NULL,
  iter = 2000,
  warmup = iter/2,
  chains = 4,
  control = list(adapt_delta = 0.98)
)

Arguments

data

A data.frame with one row for each pooled sampled and columns for the size of the pool (i.e. the number of specimens / isolates / insects pooled to make that particular pool), the result of the test of the pool. It may also contain additional columns with additional information (e.g. location where pool was taken) which can optionally be used for stratifying the data into smaller groups and calculating prevalence by group (e.g. calculating prevalence for each location)

result

The name of column with the result of each test on each pooled sample. The result must be stored with 1 indicating a positive test result and 0 indicating a negative test result.

poolSize

The name of the column with number of specimens/isolates/insects in each pool

...

Optional name(s) of columns with variables to stratify the data by. If omitted the complete dataset is used to estimate a single prevalence. If included, prevalence is estimated separately for each group defined by these columns

bayesian

Logical indicating whether Bayesian calculations should be calculated. If TRUE (the default) calculates frequentist and Bayesian estimates of prevalence, otherwise only calculates frequentist estimates (MLE and likelihood ratio confidence intervals).

prior

Prior for prevalence, ignored if bayesian == FALSE. If NULL (the default) the prior for the prevalence is the uninformative Jeffrey's prior. The only alternative prior is a possibly zero-inflated beta distribution. Zero inflation allows for some prior (and posterior) probability that the marker of interest is totally absent from the population. The parameters for this are specified with a list with three numeric non-negative entries named alpha, beta, and absent. For instance, a uniform prior with no probability of true absence can be specified as prior = list(alpha = 1, beta = 1, absent = 0.

robust

Logical. If TRUE (default), the point estimate of prevalence is the posterior median. If FALSE, the posterior mean is used instead. Applies to Bayesian estimates only and therefore ignored if bayesian = FALSE.

level

Defines the confidence level to be used for the confidence and credible intervals. Defaults to 0.95 (i.e. 95% intervals)

all.negative.pools

The kind of point estimate and interval to use when all pools are negative (Bayesian estimates only). If 'zero' (default), uses 0 as the point estimate and lower bound for the interval and level posterior quantile the upper bound of the interval. If 'consistent', result is the same as for the case where at least one pool is positive. Applies to Bayesian estimates only and therefore ignored if bayesian == FALSE.

reproduce.poolscreen

(defaults to FALSE). If TRUE this changes the way that likelihood ratio confidence intervals are computed to be somewhat wider and more closely match those returned by Poolscreen. We recommend using the default (FALSE). However setting to TRUE can help to make comparisons between PoolPrev and Poolscreen.

verbose

Logical indicating whether to print progress to screen. Defaults to false (no printing to screen). Ignored if bayesian == FALSE.

cores

The number of CPU cores to be used. By default one core is used. Ignored if bayesian == FALSE.

iter, warmup, chains

MCMC options for passing onto the sampling routine. See stan for details. Ignored if bayesian == FALSE.

control

A named list of parameters to control the sampler's behaviour. Defaults to default values as defined in stan, except for adapt_delta which is set to the more conservative value of 0.98. See stan for details. Ignored if bayesian == FALSE.

Value

An object of class PoolPrevOutput, which inherits from class tbl. The output includes the following columns:

  • PrevMLE – (the Maximum Likelihood Estimate of prevalence)

  • CILow and CIHigh - lower and upper confidence intervals using the likelihood ratio method

  • PrevBayes – the (Bayesian) posterior expectation. Omitted if bayesian == FALSE.

  • CrILow and CrIHigh – lower and upper bounds for credible intervals. Omitted if bayesian == FALSE.

  • ProbAbsent – the posterior probability that prevalence is exactly 0 (i.e. disease marker is absent). NA if using default Jeffrey's prior or if prior$absent == 0. Omitted if bayesian == FALSE.

  • NumberOfPools – number of pools

  • NumberPositive – the number of positive pools

If grouping variables are provided in ... there will be an additional column for each grouping variable. When there are no grouping variables (supplied in ...) then the output has only one row with the prevalence estimates for the whole dataset. When grouping variables are supplied, then there is a separate row for each group.

The custom print method summarises the output data frame by representing the prevalence and credible intervals as a single column in the form "Prev (CLow - CHigh)" where Prev is the prevalence, CLow is the lower confidence/credible interval and CHigh is the upper confidence/credible interval. In the print method, prevalence is represented as a percentage (i.e., per 100 units)

See Also

HierPoolPrev, getPrevalence

Examples

#Try out on a synthetic dataset consisting of pools (sizes 1, 5, or 10) taken
#from 4 different regions and 3 different years. Within each region specimens
#are collected at 4 different villages, and within each village specimens are
#collected at 8 different sites.

# Start by calculate frequentist estimates only (much faster)

#Prevalence across the whole (synthetic) dataset
PoolPrev(SimpleExampleData, Result, NumInPool, bayesian = FALSE)
#Prevalence in each Region
PoolPrev(SimpleExampleData, Result, NumInPool, Region, bayesian = FALSE)
#Prevalence for each year
PoolPrev(SimpleExampleData, Result, NumInPool, Year, bayesian = FALSE)
#Prevalence for each combination of region and year
PoolPrev(SimpleExampleData, Result, NumInPool, Region, Year, bayesian = FALSE)


  #Prevalence across the whole (synthetic) dataset, also including Bayesian Estimates - slower
  PoolPrev(SimpleExampleData, Result, NumInPool)



PoolTestR documentation built on April 3, 2025, 9:28 p.m.