getPrevalence: Predicting Prevalence from a Mixed or Fixed Effect Logistic...
In PoolTestR: Prevalence and Regression for Pool-Tested (Group-Tested) Data

getPrevalence

R Documentation

Predicting Prevalence from a Mixed or Fixed Effect Logistic Regression with Presence/Absence Tests on Pooled Samples

Description

This function works somewhat like a predict or fitted generic function returning the model predicted prevalence for a given set of data; however, as the quantity of interest (prevalence) is neither on the response or link scale we do not use either of these generic functions. Further, when the model accounts for the hierarchical structure of the sampling frame (e.g. Region/Village/Site), it is common to want to know the predicted values at each level of sampling (e.g. Prevalence at each region, village or site) so these are calculated automatically. Also to calculate population-level prevalence from a mixed model, random/group effects need to marginalised out to avoid biased estimates. This is performed automatically.

Usage

getPrevalence(model, ...)

## S3 method for class 'glm'
getPrevalence(model, newdata = NULL, level = 0.95, ...)

## S3 method for class 'glmerMod'
getPrevalence(
  model,
  newdata = NULL,
  re.form = NULL,
  all.negative.pools = "zero",
  ...
)

## S3 method for class 'brmsfit'
getPrevalence(
  model,
  newdata = NULL,
  re.form = NULL,
  robust = TRUE,
  level = 0.95,
  all.negative.pools = "zero",
  ...
)

Arguments

`model`	An object returned by [PoolReg()] or [PoolRegBayes()]
`...`	Arguments passed to methods for each class
`newdata`	The data for which prevalence needs to be estimated/predicted. If not provided, defaults to using the data used to train the model (i.e. returns the fitted values of the prevalence)
`level`	Defines the confidence level to be used for the confidence and credible intervals. Defaults to 0.95 (i.e. 95% intervals).
`re.form`	A description of which random effects to include in the prediction. If omitted, an attempt is made to infer from model and data structure.
`all.negative.pools`	The kind of point estimate and interval to use when all pools are negative. Typically ignored unless newdata is NULL. If `'zero'` (default), uses 0 as the point estimate and lower bound for the interval and `level` posterior quantile the upper bound of the interval. If `'consistent'`, result is the same as for the case where at least one pool is positive.
`robust`	Logical. Option when model class is `brmsfit`. If `TRUE` (default) the point estimate of prevalence is the posterior median. If `FALSE`, the the posterior mean is used instead.

Details

If re.form is omitted (probably the most common use case) getPrevalence will test to see if there are any random effect terms in the model formula extracted from the model object. If not, it just returns the estimates based on population effects. If there are random effects, it tests to see if the random effect variables form a nested hierarchical structure in the data provided. If so, in addition to the estimates based on population effects only, it will estimate at different levels of the nested hierarchical structure in order of increasing granularity. For manual control you can set to NA for population effects only, or a one-sided formula specifying the form of the random effects to include in estimates, or a list of such objects. Any random effects omitted will be marginalised out. For automatically detected nested hierarchical structures this means that higher level estimates marginalise over lower-level random effect; in particular, population level estimates will marginalise over all random effects.

Value

A list with at least one field PopulationEffects and an additional field for every random/group effect variable. The field PopulationEffects contains a data.frame with the prevalence estimated based only the fixed/population effects. When the intercept is the only fixed/population effect, this is just the population mean (possibly adjusted for random/group effects). When there are group effects terms, getPrevalence attempts to order these with respect to 'granularity' and extract the prevalence estimates for these random effects; e.g. if the random/group effects included are there to account for a hierarchical sampling frame with levels 'Village' and 'Site' with a formula like Result ~ Cov1 + Cov2 + (1|Village) + (1|Site), then getPrevalence will be a list of three data frames: estimates for every combination of covariates, estimates for every combination of covariates and village, and estimates for every combination of covariates, village, and site.

Examples

# Perform logistic-type regression modelling for a synthetic dataset consisting
# of pools (sizes 1, 5, or 10) taken from 4 different regions and 3 different
# years. Within each region specimens are collected at 4 different villages,
# and within each village specimens are collected at 8 different sites.


### Models in a frequentist framework
#ignoring hierarchical sampling frame within each region
Mod <- PoolReg(Result ~ Region + Year,
               data = SimpleExampleData,
               poolSize = NumInPool)
summary(Mod)

#accounting hierarchical sampling frame within each region
HierMod <- PoolReg(Result ~ Region + Year + (1|Village) + (1|Site),
                   data = SimpleExampleData,
                   poolSize = NumInPool)
summary(HierMod)


### Models in a Bayesian framework with default (non-informative) priors
#ignoring hierarchical sampling frame within each region

  BayesMod <- PoolRegBayes(Result ~ Region + Year,
                           data = SimpleExampleData,
                           poolSize = NumInPool)
  summary(BayesMod)

  #we could also account for hierarchical sampling frame within each region but
  #note that this is more complex and slower)

  # BayesHierMod <- PoolRegBayes(Result ~ Region + Year + (1|Village) + (1|Site),
  #                              data = SimpleExampleData,
  #                              poolSize = NumInPool)


### Calculate adjusted estimates of prevalence
# We use the same function for all four models, but the outputs are slightly different

#For models without hierarchical sampling structure there is an estimate of
#prevalence for every combination of population (fixed) effects: e.g. Region and
#Year
getPrevalence(Mod) #Frequentist model

  getPrevalence(BayesMod) #Bayesian model


#For models without hierarchical sampling structure, there is a prevalence
#estimate for each combination of region and year and then at each level of the
#hierarchical sampling frame (i.e. for each village in each region and each site
#in each village)
getPrevalence(HierMod)

# You can also use getPrevalence to predict prevalence for other values of the
# covariates (e.g. predict prevalence in year 4 based on linear trend on the
# logit scale)

#Making a data frame containing data make predictions on
DataFuture <- unique(data.frame(Region = SimpleExampleData$Region,
                                Village = SimpleExampleData$Village,
                                Site = SimpleExampleData$Site,
                                Year = 4))

getPrevalence(Mod, newdata = DataFuture)
getPrevalence(HierMod, newdata = DataFuture)

PoolTestR documentation built on April 3, 2025, 9:28 p.m.