para_prevalence_CI: Parasite prevalence estimation and confidence intervals

View source: R/para_prevalence_CI.R

para_prevalence_CIR Documentation

Parasite prevalence estimation and confidence intervals

Description

Estimates parasite prevalence and corresponding confidence intervals from parasite abundance data, optionally stratified by grouping variables. Two types of confidence intervals are provided: exact binomial intervals and Blaker intervals, allowing robust inference across a wide range of sample sizes and prevalence values.

Usage

para_prevalence_CI(dataset, sp_cols, group_vars = NULL, decimal_places = 2,
 conf_level = 0.95, output_type = "proportion", combine_ci = FALSE, verbose = FALSE)

Arguments

dataset

Data frame with parasitological data.

sp_cols

Vector with the names of the columns containing abundance of parasites (taxa) to calculate the parasitological descriptors.

group_vars

Vector with the names of categorical variables used to define groups (e.g., "Sex", "Site"). Default is NULL.

decimal_places

Number of decimal places to round the values. Default is 2.

conf_level

Confidence level for interval estimation (e.g., 0.95 for 95% confidence intervals).

output_type

Format of the result: either "proportion" or "percentage". Default is "proportion".

combine_ci

Logical. If TRUE, the interval is expressed as a single column (min - max). If FALSE, the interval is split into separate lower and upper limit columns.

verbose

A logical value indicating if progress messages should be given. Default = FALSE

Details

Prevalence is defined as the proportion of hosts infected with a given parasite taxon:

P = \frac{nH_{inf}}{nH}

where:

  • nH is the number of hosts analyzed (non-missing observations)

  • nH_{inf} nHinf is the number of infected hosts (abundance > 0)

The function reshapes the dataset into long format and computes prevalence for each parasite taxon and grouping combination (if specified). Two types of confidence intervals are calculated:

  • Exact (Clopper–Pearson) interval: This is an exact binomial confidence interval, conservative but valid for all sample sizes, especially small samples or extreme prevalence values.

  • Blaker interval: This interval is also exact but less conservative than Clopper–Pearson, providing shorter intervals while maintaining correct coverage.

Statistical considerations:

  • Prevalence is a binomial proportion and can be estimated even for small sample sizes.

  • However, when sample size is very small (e.g., nH=1), the estimate lacks precision and confidence intervals become uninformative.

  • When no infected hosts are observed (nH_{inf}=0), prevalence is 0, and confidence intervals reflect uncertainty around zero.

The interpretation of results, particularly under small sample sizes, remains the responsibility of the user.

Value

A data frame containing prevalence estimates and confidence intervals for each parasite taxon, either globally or by group. The following variables are returned:

  • nH: Number of hosts analyzed

  • nH_inf: Number of infested hosts

  • prevalence: Estimated prevalence

  • Lower_exact: Lower bound of the exact (Clopper–Pearson) interval

  • Upper_exact: Upper bound of the exact (Clopper–Pearson) interval

  • Lower_blaker: Lower bound of the Blaker interval

  • Upper_blaker: Upper bound of the Blaker interval

  • Observation: Categorical description of the data context:

    • "Not analyzed": No valid observations are available for the given combination (all values are missing or the combination is absent in the dataset); therefore, no estimates can be computed.

    • "One host analyzed": Only a single host analyzed is available for the given combination; thus, no population-level inference is possible and statistical summary measures are not estimated.

    • "No hosts infested": Hosts are present for the given combination, but none are infested (abundance = 0 for all observations); consequently, no statistical summary measures of abundance or intensity can be estimated.

    • "One host infested": Only a single infested host is recorded for the given combination; therefore, no sample-based estimation of intensity or related summary measures is possible.

    • "Multiple hosts infested": More than one infested host is recorded for the given combination, allowing the estimation of summary measures.

Author(s)

Juan Manuel Cabrera, Exequiel Furlan and Elisa Helman

References

Bush, A.O., Lafferty, K.D., Lotz, J.M., Shostak, A.W. (1997). Parasitology meets ecology on its own terms: Margolis revisited. Journal of Parasitology, 83(4), 575–583.

Reiczigel, J., Marozzi, M., Fabian, I., Rózsa, L. (2019). Biostatistics for parasitologists – a primer to quantitative parasitology. Trends in Parasitology, 35(4), 277–281.

Examples

prevalence_CI <- para_prevalence_CI(para_data$dataset,
                                   sp_cols =  c("Sp1"),
                                   group_vars = c("Site"),
                                   decimal_places = 2,
                                   conf_level = 0.95,
                                   output_type = "proportion",
                                   combine_ci = TRUE,
                                   verbose = TRUE)

prevalence_CI



parasiteR documentation built on May 13, 2026, 9:08 a.m.