para_prevalence_CI: Parasite prevalence estimation and confidence intervals
In parasiteR: A Theorical-Practical Approach to Parasitological Data Analysis

View source: R/para_prevalence_CI.R

para_prevalence_CI

R Documentation

Parasite prevalence estimation and confidence intervals

Description

Estimates parasite prevalence and corresponding confidence intervals from parasite abundance data, optionally stratified by grouping variables. Two types of confidence intervals are provided: exact binomial intervals and Blaker intervals, allowing robust inference across a wide range of sample sizes and prevalence values.

Usage

para_prevalence_CI(dataset, sp_cols, group_vars = NULL, decimal_places = 2,
 conf_level = 0.95, output_type = "proportion", combine_ci = FALSE, verbose = FALSE)

Arguments

`dataset`	Data frame with parasitological data.
`sp_cols`	Vector with the names of the columns containing abundance of parasites (taxa) to calculate the parasitological descriptors.
`group_vars`	Vector with the names of categorical variables used to define groups (e.g., "Sex", "Site"). Default is `NULL`.
`decimal_places`	Number of decimal places to round the values. Default is `2`.
`conf_level`	Confidence level for interval estimation (e.g., `0.95` for 95% confidence intervals).
`output_type`	Format of the result: either `"proportion"` or `"percentage"`. Default is `"proportion"`.
`combine_ci`	Logical. If `TRUE`, the interval is expressed as a single column (min - max). If `FALSE`, the interval is split into separate lower and upper limit columns.
`verbose`	A logical value indicating if progress messages should be given. Default = `FALSE`

Details

Prevalence is defined as the proportion of hosts infected with a given parasite taxon:

P = \frac{nH_{inf}}{nH}

where:

nH is the number of hosts analyzed (non-missing observations)
nH_{inf} nHinf is the number of infected hosts (abundance > 0)

The function reshapes the dataset into long format and computes prevalence for each parasite taxon and grouping combination (if specified). Two types of confidence intervals are calculated:

Exact (Clopper–Pearson) interval: This is an exact binomial confidence interval, conservative but valid for all sample sizes, especially small samples or extreme prevalence values.
Blaker interval: This interval is also exact but less conservative than Clopper–Pearson, providing shorter intervals while maintaining correct coverage.

Statistical considerations:

Prevalence is a binomial proportion and can be estimated even for small sample sizes.
However, when sample size is very small (e.g., nH=1), the estimate lacks precision and confidence intervals become uninformative.
When no infected hosts are observed (nH_{inf}=0), prevalence is 0, and confidence intervals reflect uncertainty around zero.

The interpretation of results, particularly under small sample sizes, remains the responsibility of the user.

Value

A data frame containing prevalence estimates and confidence intervals for each parasite taxon, either globally or by group. The following variables are returned:

nH: Number of hosts analyzed
nH_inf: Number of infested hosts
prevalence: Estimated prevalence
Lower_exact: Lower bound of the exact (Clopper–Pearson) interval
Upper_exact: Upper bound of the exact (Clopper–Pearson) interval
Lower_blaker: Lower bound of the Blaker interval
Upper_blaker: Upper bound of the Blaker interval
Observation: Categorical description of the data context:
- "Not analyzed": No valid observations are available for the given combination (all values are missing or the combination is absent in the dataset); therefore, no estimates can be computed.
- "One host analyzed": Only a single host analyzed is available for the given combination; thus, no population-level inference is possible and statistical summary measures are not estimated.
- "No hosts infested": Hosts are present for the given combination, but none are infested (abundance = 0 for all observations); consequently, no statistical summary measures of abundance or intensity can be estimated.
- "One host infested": Only a single infested host is recorded for the given combination; therefore, no sample-based estimation of intensity or related summary measures is possible.
- "Multiple hosts infested": More than one infested host is recorded for the given combination, allowing the estimation of summary measures.

Author(s)

Juan Manuel Cabrera, Exequiel Furlan and Elisa Helman

References

Bush, A.O., Lafferty, K.D., Lotz, J.M., Shostak, A.W. (1997). Parasitology meets ecology on its own terms: Margolis revisited. Journal of Parasitology, 83(4), 575–583.

Reiczigel, J., Marozzi, M., Fabian, I., Rózsa, L. (2019). Biostatistics for parasitologists – a primer to quantitative parasitology. Trends in Parasitology, 35(4), 277–281.

Examples

prevalence_CI <- para_prevalence_CI(para_data$dataset,
                                   sp_cols =  c("Sp1"),
                                   group_vars = c("Site"),
                                   decimal_places = 2,
                                   conf_level = 0.95,
                                   output_type = "proportion",
                                   combine_ci = TRUE,
                                   verbose = TRUE)

prevalence_CI

parasiteR documentation built on May 13, 2026, 9:08 a.m.