para_descriptors: Parasitological descriptors and summary statistics

View source: R/para_descriptors.R

para_descriptorsR Documentation

Parasitological descriptors and summary statistics

Description

Computes standard parasitological descriptors and classical summary statistics from parasite abundance data, optionally stratified by grouping variables.

Usage

para_descriptors(dataset, sp_cols = NULL, group_vars = NULL,
 decimal_places = 2,  verbose = FALSE)

Arguments

dataset

Data frame with parasitic abundance data.

sp_cols

Vector with the names or indices of the species columns.

group_vars

Vector with the names of the categorical variables to consider (e.g., 'Sex', 'Site').

decimal_places

Number of decimal places to round the values.

verbose

A logical value indicating if progress messages should be given.

Details

The para_descriptors function provides a practical and efficient way to estimate the main parasitological descriptors commonly used in ecological and parasitological studies. Calculations can be performed globally or at different hierarchical levels defined by grouping variables.

The function computes descriptors based on parasite abundance per sampling unit (e.g., host, site, or pooled hosts), following standard definitions:

  • Prevalence (P): Proportion of infected hosts..

  • Abundance (A): Total number of parasites recorded.

  • Intensity (I): Number of parasites per infected host.

Statistical validity and sample size considerations: The estimation of summary statistics is subject to fundamental statistical constraints related to sample size and variability.

  • Host population (nH): Summary statistics for abundance (e.g., mean, median, standard deviation) are only meaningful when more than one host is analyzed (nH > 1). When only a single host is available (nH = 1), no population of hosts exists and therefore no variability can be estimated. In such cases, the observed value is reported, but it should not be interpreted as a population-level descriptor. Summary statistics for abundance (e.g., mean, median, standard deviation) are only meaningful when more than one host is analyzed (nH > 1). When only a single host is available (nH = 1), no population of hosts exists and therefore no variability can be estimated. In such cases, the observed value is reported, but it should not be interpreted as a population-level descriptor.

  • Infected host population (nH_inf): Similarly, intensity-based descriptors are only meaningful when more than one infected host is available (nH_inf > 1). When only one infected host is present, the parasite count corresponds to a single observation rather than a distribution and summary statistics of intensity cannot be formally estimated.

These constraints reflect a fundamental principle: statistical descriptors require variability, and variability requires more than one observational unit. When this condition is not met, results should be interpreted cautiously, and no generalization beyond the observed case is justified. Handling of special cases: The function automatically adjusts calculations depending on data availability:

  • When no data are available → results are reported as NA.

  • When hosts are analyzed but none are infected → prevalence is 0 and intensity measures are not computed.

  • When only one host or one infected host is available → corresponding summary statistics are not computed, and interpretation should be limited to the observed value.

The selection and interpretation of descriptors remain the responsibility of the user, particularly when working with small sample sizes.

Value

A data frame containing the calculated parasitological descriptors for each parasite taxon, either globally or by group (if grouping variables are specified). The following variables are returned:

  • nH: Number of hosts analyzed

  • nH_inf: Number of infected hosts

  • A: Total parasite abundance

  • min: Minimum parasite count

  • max: Maximum parasite count

  • P: Parasitic prevalence

  • MeanA: Mean parasitic abundance

  • MeanA_sd: Standard deviation of mean parasite abundance

  • A_iqr: Interquartile range of mean parasite abundance

  • MedA:Median parasite abundance

  • MedA_sd: Median absolute deviation of parasite abundance

  • MeanI: Mean parasite intensity

  • MeanI_sd: Standard deviation of mean parasite intensity

  • I_iqr: Interquartile range of mean parasite intensity

  • MedI: Median parasite intensity

  • MedI_sd: Median absolute deviation of parasite intensity

  • Observation: Qualitative descriptor indicating data availability and sample structure for each hierarchical combination:

    • "Not analyzed": No valid observations are available for the given combination (all values are missing or the combination is absent in the dataset); therefore, no estimates can be computed.

    • "One host analyzed": Only a single host analyzed is available for the given combination; thus, no population-level inference is possible and statical summary measures are not estimated.

    • "No hosts infested": Hosts are present for the given combination, but none are infested (abundance = 0 for all observations); consequently, no statical summary measures of abundance or intensity can be estimated.

    • "One host infested": Only a single infested host is recorded for the given combination; therefore, no sample-based estimation of intensity or related summary measures is possible.

    • "Multiple hosts infested": More than one infested host is recorded for the given combination, allowing the estimation of summary measures.

Author(s)

Juan Manuel Cabrera, Exequiel Furlan and Elisa Helman

References

Bush, A.O., Lafferty, K.D., Lotz, J.M., Shostak, A.W. (1997). Parasitology meets ecology on its own terms: Margolis revisited. Journal of Parasitology, 83(4), 575–583.

Reiczigel, J., Marozzi, M., Fabian, I., Rózsa, L. (2019). Biostatistics for parasitologists – a primer to quantitative parasitology. Trends in Parasitology, 35(4), 277–281.

Examples


gral_descriptor <- para_descriptors(para_data$dataset,
                                   sp_cols =  c("Sp1", "Sp2", "Sp3", "Sp4"),
                                   group_vars = c("Site","Sp_host"),
                                   decimal_places = 2,
                                   verbose = FALSE)

gral_descriptor


parasiteR documentation built on May 13, 2026, 9:08 a.m.