View source: R/para_descriptors.R
| para_descriptors | R Documentation |
Computes standard parasitological descriptors and classical summary statistics from parasite abundance data, optionally stratified by grouping variables.
para_descriptors(dataset, sp_cols = NULL, group_vars = NULL,
decimal_places = 2, verbose = FALSE)
dataset |
Data frame with parasitic abundance data. |
sp_cols |
Vector with the names or indices of the species columns. |
group_vars |
Vector with the names of the categorical variables to consider (e.g., 'Sex', 'Site'). |
decimal_places |
Number of decimal places to round the values. |
verbose |
A logical value indicating if progress messages should be given. |
The para_descriptors function provides a practical and efficient way to estimate the main parasitological descriptors commonly used in ecological and parasitological studies. Calculations can be performed globally or at different hierarchical levels defined by grouping variables.
The function computes descriptors based on parasite abundance per sampling unit (e.g., host, site, or pooled hosts), following standard definitions:
Prevalence (P): Proportion of infected hosts..
Abundance (A): Total number of parasites recorded.
Intensity (I): Number of parasites per infected host.
Statistical validity and sample size considerations: The estimation of summary statistics is subject to fundamental statistical constraints related to sample size and variability.
Host population (nH): Summary statistics for abundance (e.g., mean, median, standard deviation) are only meaningful when more than one host is analyzed (nH > 1). When only a single host is available (nH = 1), no population of hosts exists and therefore no variability can be estimated. In such cases, the observed value is reported, but it should not be interpreted as a population-level descriptor. Summary statistics for abundance (e.g., mean, median, standard deviation) are only meaningful when more than one host is analyzed (nH > 1). When only a single host is available (nH = 1), no population of hosts exists and therefore no variability can be estimated. In such cases, the observed value is reported, but it should not be interpreted as a population-level descriptor.
Infected host population (nH_inf): Similarly, intensity-based descriptors are only meaningful when more than one infected host is available (nH_inf > 1). When only one infected host is present, the parasite count corresponds to a single observation rather than a distribution and summary statistics of intensity cannot be formally estimated.
These constraints reflect a fundamental principle: statistical descriptors require variability, and variability requires more than one observational unit. When this condition is not met, results should be interpreted cautiously, and no generalization beyond the observed case is justified. Handling of special cases: The function automatically adjusts calculations depending on data availability:
When no data are available → results are reported as NA.
When hosts are analyzed but none are infected → prevalence is 0 and intensity measures are not computed.
When only one host or one infected host is available → corresponding summary statistics are not computed, and interpretation should be limited to the observed value.
The selection and interpretation of descriptors remain the responsibility of the user, particularly when working with small sample sizes.
A data frame containing the calculated parasitological descriptors for each parasite taxon, either globally or by group (if grouping variables are specified). The following variables are returned:
nH: Number of hosts analyzed
nH_inf: Number of infected hosts
A: Total parasite abundance
min: Minimum parasite count
max: Maximum parasite count
P: Parasitic prevalence
MeanA: Mean parasitic abundance
MeanA_sd: Standard deviation of mean parasite abundance
A_iqr: Interquartile range of mean parasite abundance
MedA:Median parasite abundance
MedA_sd: Median absolute deviation of parasite abundance
MeanI: Mean parasite intensity
MeanI_sd: Standard deviation of mean parasite intensity
I_iqr: Interquartile range of mean parasite intensity
MedI: Median parasite intensity
MedI_sd: Median absolute deviation of parasite intensity
Observation: Qualitative descriptor indicating data availability and sample structure for each hierarchical combination:
"Not analyzed": No valid observations are available for the given combination (all values are missing or the combination is absent in the dataset); therefore, no estimates can be computed.
"One host analyzed": Only a single host analyzed is available for the given combination; thus, no population-level inference is possible and statical summary measures are not estimated.
"No hosts infested": Hosts are present for the given combination, but none are infested (abundance = 0 for all observations); consequently, no statical summary measures of abundance or intensity can be estimated.
"One host infested": Only a single infested host is recorded for the given combination; therefore, no sample-based estimation of intensity or related summary measures is possible.
"Multiple hosts infested": More than one infested host is recorded for the given combination, allowing the estimation of summary measures.
Juan Manuel Cabrera, Exequiel Furlan and Elisa Helman
Bush, A.O., Lafferty, K.D., Lotz, J.M., Shostak, A.W. (1997). Parasitology meets ecology on its own terms: Margolis revisited. Journal of Parasitology, 83(4), 575–583.
Reiczigel, J., Marozzi, M., Fabian, I., Rózsa, L. (2019). Biostatistics for parasitologists – a primer to quantitative parasitology. Trends in Parasitology, 35(4), 277–281.
gral_descriptor <- para_descriptors(para_data$dataset,
sp_cols = c("Sp1", "Sp2", "Sp3", "Sp4"),
group_vars = c("Site","Sp_host"),
decimal_places = 2,
verbose = FALSE)
gral_descriptor
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.