para_abundance_CI: Mean or median abundance estimation and confidence intervals

View source: R/para_abundance_CI.R

para_abundance_CIR Documentation

Mean or median abundance estimation and confidence intervals

Description

This function calculates point estimates and confidence intervals (CIs) for parasite abundance, using either the mean or the median as a measure of central tendency. Confidence intervals are estimated via a non-parametric bootstrap approach based on resampling (permutations) of the observed data. Specifically, the function implements bias-corrected and accelerated (BCa) bootstrap intervals, which adjust for both bias and skewness in the bootstrap distribution. This approach does not assume a specific underlying distribution and is particularly robust for overdispersed and zero-inflated parasitological data.

Usage

para_abundance_CI(dataset, c_median = TRUE,
 sp_cols, group_vars = NULL,  perm = 2000, decimal_places = 2,
 combine_ci = FALSE,  conf_level = 0.95, verbose = FALSE)

Arguments

dataset

Data frame with parasitological data.

c_median

Logical. If TRUE, the results will include the median as a central tendency of measure; if FALSE, the results will include the mean of the data.

sp_cols

Vector with the names of the columns containing abundance of parasites (taxa) to calculate the parasitological descriptors.

group_vars

Vector with the names of categorical variables used to define groups (e.g., "Sex", "Site"). Default = NULL.

perm

Number of permutations to perform for confidence interval estimation. Default = 2000.

decimal_places

Number of decimal places to include in the calculation. Default = 2.

combine_ci

Logical. If TRUE, the interval is expressed as a single column (min - max). If FALSE, the interval is split into separate lower and upper limit columns.

conf_level

Confidence level for the interval estimation (e.g., 0.95 for 95% CI).

verbose

A logical value indicating if progress messages should be given. Default = FALSE.

Details

Parasite abundance is defined as the number of individuals of a given parasite taxon per host. For each taxon, abundance metrics are calculated based on the observed counts across hosts. The function reshapes the dataset into long format and computes abundance statistics for each parasite taxon and grouping combination (if specified). The following are estimated:

  • A is the total parasite abundance

  • nH is the number of hosts analyzed

  • nH_inf is the number of infected hosts

Depending on the argument c_median, the function calculates:

  • Mean abundance MeanA: average number of parasites per host

  • Median abundance MedA: median number of parasites per host

Confidence intervals are estimated using a non-parametric bootstrap approach. Specifically, bias-corrected and accelerated (BCa) bootstrap intervals are computed by resampling the observed abundance values with replacement a specified number of times perm. This method adjusts for both bias and skewness in the bootstrap distribution. Statistical considerations: parasite abundance data are typically overdispersed and zero-inflated, making parametric assumptions inappropriate in many cases. The use of bootstrap methods allows robust estimation of confidence intervals without assuming normality. Mean abundance is sensitive to extreme values, whereas median abundance provides a more robust measure under highly skewed distributions. When sample size is small, bootstrap confidence intervals may be unstable or wide, and results should be interpreted with caution. The interpretation of results remains the responsibility of the user.

Value

A data frame containing abundance estimates and confidence intervals for each parasite taxon, either globally or by group. The following variables are returned:

  • nH: Number of hosts analyzed

  • nH_inf: Number of infected hosts

  • A: Total parasite abundance

  • MeanA:Mean parasite abundance

  • MedA: Median parasite abundance

  • Lower_CI: Lower bound of the bootstrap confidence interval

  • Upper_CI: Upper bound of the bootstrap confidence interval

  • CI: If combine_ci = TRUE, confidence interval expressed are store as a single column (Lower_CI - Upper_CI)

  • Observation: Categorical description of the data context:

    • "Not analyzed": No valid observations are available for the given combination (all values are missing or the combination is absent in the dataset); therefore, no estimates can be computed.

    • "One host analyzed": Only a single host analyzed is available for the given combination; thus, no population-level inference is possible and statistical summary measures are not estimated.

    • "No hosts infested": Hosts are present for the given combination, but none are infested (abundance = 0 for all observations); consequently, no statistical summary measures of abundance or intensity can be estimated.

    • "One host infested": Only a single infested host is recorded for the given combination; therefore, no sample-based estimation of intensity or related summary measures is possible.

    • "Multiple hosts infested": More than one infested host is recorded for the given combination, allowing the estimation of summary measures.

Author(s)

Juan Manuel Cabrera, Exequiel Furlan and Elisa Helman

References

Bush, A.O., Lafferty, K.D., Lotz, J.M., Shostak, A.W. (1997). Parasitology meets ecology on its own terms: Margolis revisited. Journal of Parasitology, 83(4), 575–583.

Reiczigel, J., Marozzi, M., Fabian, I., Rózsa, L. (2019). Biostatistics for parasitologists – a primer to quantitative parasitology. Trends in Parasitology, 35(4), 277–281.

Examples

#Calculate the CI for the median abundance
med_abun_CI <- para_abundance_CI(para_data$dataset,
                                c_median = TRUE,
                                sp_cols =  c("Sp1"),
                                group_vars = c("Site"),
                                decimal_places = 2,
                                conf_level = 0.95,
                                combine_ci = TRUE,
                                verbose = TRUE)
med_abun_CI
#Calculate the CI for the mean abundance
mean_abun_CI <- para_abundance_CI(para_data$dataset,
                                 c_median = FALSE,
                                 sp_cols =  c("Sp1"),
                                 group_vars = c("Site"),
                                 decimal_places = 2,
                                 conf_level = 0.95,
                                 combine_ci = TRUE,
                                 verbose = TRUE)
mean_abun_CI


parasiteR documentation built on May 13, 2026, 9:08 a.m.