HierPoolPrev: Estimation of prevalence based on presence/absence tests on...

View source: R/HierPoolPrev.R

HierPoolPrevR Documentation

Estimation of prevalence based on presence/absence tests on pooled samples in a hierarchical sampling frame. Uses an intercept-only random effects model to model prevalence at population level. See PoolReg and PoolRegBayes for full mixed-effect modelling

Description

Estimation of prevalence based on presence/absence tests on pooled samples in a hierarchical sampling frame. Uses an intercept-only random effects model to model prevalence at population level. See PoolReg and PoolRegBayes for full mixed-effect modelling

Usage

HierPoolPrev(
  data,
  result,
  poolSize,
  hierarchy,
  ...,
  prior = NULL,
  robust = TRUE,
  level = 0.95,
  verbose = FALSE,
  cores = NULL,
  iter = 2000,
  warmup = iter/2,
  chains = 4,
  control = list(adapt_delta = 0.9),
  all.negative.pools = "zero"
)

Arguments

data

A data.frame with one row for each pooled sampled and columns for the size of the pool (i.e. the number of specimens / isolates / insects pooled to make that particular pool), the result of the test of the pool. It may also contain additional columns with additional information (e.g. location where pool was taken) which can optionally be used for splitting the data into smaller groups and calculating prevalence by group (e.g. calculating prevalence for each location)

result

The name of column with the result of each test on each pooled sample. The result must be stored with 1 indicating a positive test result and 0 indicating a negative test result.

poolSize

The name of the column with number of specimens/isolates/insects in each pool

hierarchy

The name of column(s) indicating the group membership. In a nested sampling design with multiple levels of grouping the lower-level groups must have names/numbers that differentiate them from all other groups at the same level. E.g. If sampling was performed at 200 sites across 10 villages (20 site per village), then there should be 200 unique names for the sites. If, for instance, the sites are instead numbered 1 to 20 within each village, the village identifier (e.g. A, B, C...) should be combined with the site number to create unique identifiers for each site (e.g. A-1, A-2... for sites in village A and B-1, B-2... for the sites in village B etc.)

...

Optional name(s) of columns with variables to stratify the data by. If omitted the complete dataset is used to estimate a single prevalence. If included prevalence is estimated separately for each group defined by these columns

prior

List of parameters specifying the parameters for the the priors on the population intercept and standard deviations of group-effect terms. See details.

robust

Logical. If TRUE (default), the point estimate of prevalence is the posterior median. If FALSE, the posterior mean is used instead.

level

The confidence level to be used for the confidence and credible intervals. Defaults to 0.95 (i.e. 95% intervals)

verbose

Logical indicating whether to print progress to screen. Defaults to false (no printing to screen)

cores

The number of CPU cores to be used. By default one core is used

iter, warmup, chains

MCMC options for passing onto the sampling routine. See stan for details.

control

A named list of parameters to control the sampler's behaviour. Defaults to default values as defined in stan, except for adapt_delta which is set to the more conservative value of 0.9. See stan for details.

all.negative.pools

The kind of point estimate and interval to use when all pools are negative (Bayesian estimates only). If 'zero' (default), uses 0 as the point estimate and lower bound for the interval and level posterior quantile the upper bound of the interval. If 'consistent', result is the same as for the case where at least one pool is positive.

Details

When using the default value of the prior argument (NULL), the model uses the following prior: list(intercept = list(nu = 3, mu = 0, sigma = 4.0), group_sd = list(nu = 3, mu = 0, sigma = 2.5), individual_sd = FALSE) This models the prior of the linear scale intercept as t-distributed with parameters in 'intercept' and the standard deviation of the group-level effects as truncated (non-negative) t-distribution. 'individual_sd = FALSE' means that this prior is for the root-sum-square of group-effect standard deviations for models with multiple grouping levels. The default implies a prior on population prevalence that is approximately distributed as beta(0.5,0.5). To set custom priors, use the same nested list format. Any omitted parameters will be replaced with the default values and additional parameters ignored silently. For example, to change the parameters to be equal to the defaults for intercept-only random-effect model in PoolRegBayes you can use: list(individual_sd = TRUE), which puts a prior on each the standard deviations of each of group-level effects separately, but doesn't change the priors used.

Value

An object of class HierPoolPrevOutput, which inherits from class tbl. The output includes the following columns:

  • PrevBayes – the (Bayesian) posterior expectation

  • CrILow and CrIHigh – lower and upper bounds for credible intervals

  • NumberOfPools – number of pools

  • NumberPositive – the number of positive pools

  • ICC – the estimated intra-cluster correlation coefficient

  • ICC_CrILow and ICC_CrIHigh – lower and upper bounds for credible intervals of the estimated ICC

The three ICC columns (ICC, ICC_CrILow and ICC_CrIHigh) are matrix columns. These contain one column for each variable included in the hierarchy. E.g., if the input hierarchy is c("Village", "Site"), each of the three ICC matrix columns will contain one column with results for Village and one column with results for Site.

If grouping variables are provided in ... there will be an additional column for each grouping variable. When there are no grouping variables (supplied in ...) then the output has only one row with the prevalence estimates for the whole dataset. When grouping variables are supplied, then there is a separate row for each group.

The custom print method summarises the output data frame by representing output variables with credible intervals (i.e., PrevBayes, ICC) as a single column in the form "X (CrILow - CrIHigh)" where X is the variable, CrILow is the lower credible interval and CrIHigh is the upper credible interval. In the print method, prevalence PrevBayes is represented as a percentage (i.e., per 100 units).

See Also

PoolPrev, getPrevalence

Examples

# Calculate prevalence for a synthetic dataset consisting of pools (sizes 1, 5,
# or 10) taken from 3 different years. Specimens are collected at 16 different
# villages, and within each village specimens are collected at 8 different
# sites.


  #Prevalence for each year:
  #ignoring hierarchical sampling frame be:
  PoolPrev(SimpleExampleData, Result, NumInPool, Year)
  #accounting hierarchical sampling frame within each region
  HierPoolPrev(SimpleExampleData, Result, NumInPool, c("Village","Site"), Year)




PoolTestR documentation built on April 3, 2025, 9:28 p.m.