Nothing
#' Parasitological descriptors and summary statistics
#'
#' Computes standard parasitological descriptors and classical summary statistics from parasite abundance data, optionally stratified by grouping variables.
#'
#' The para_descriptors function provides a practical and efficient way to estimate the main parasitological descriptors commonly used in ecological and parasitological studies. Calculations can be performed globally or at different hierarchical levels defined by grouping variables.
#'
#' The function computes descriptors based on parasite abundance per sampling unit (e.g., host, site, or pooled hosts), following standard definitions:
#'\itemize{
#' \item Prevalence (P): Proportion of infected hosts..
#' \item Abundance (A): Total number of parasites recorded.
#' \item Intensity (I): Number of parasites per infected host.
#' }
#'
#' Statistical validity and sample size considerations: The estimation of summary statistics is subject to fundamental statistical constraints related to sample size and variability.
#' \itemize{
#' \item \strong{Host population (nH):} Summary statistics for abundance (e.g., mean, median, standard deviation) are only meaningful when more than one host is analyzed (nH > 1). When only a single host is available (nH = 1), no population of hosts exists and therefore no variability can be estimated. In such cases, the observed value is reported, but it should not be interpreted as a population-level descriptor. Summary statistics for abundance (e.g., mean, median, standard deviation) are only meaningful when more than one host is analyzed (nH > 1). When only a single host is available (nH = 1), no population of hosts exists and therefore no variability can be estimated. In such cases, the observed value is reported, but it should not be interpreted as a population-level descriptor.
#' \item \strong{Infected host population (nH_inf):} Similarly, intensity-based descriptors are only meaningful when more than one infected host is available (nH_inf > 1). When only one infected host is present, the parasite count corresponds to a single observation rather than a distribution and summary statistics of intensity cannot be formally estimated.
#' }
#'
#' These constraints reflect a fundamental principle: statistical descriptors require variability, and variability requires more than one observational unit. When this condition is not met, results should be interpreted cautiously, and no generalization beyond the observed case is justified.
#' Handling of special cases: The function automatically adjusts calculations depending on data availability:
#' \itemize{
#' \item When no data are available → results are reported as \code{NA}.
#' \item When hosts are analyzed but none are infected → prevalence is 0 and intensity measures are not computed.
#' \item When only one host or one infected host is available → corresponding summary statistics are not computed, and interpretation should be limited to the observed value.
#' }
#'
#' The selection and interpretation of descriptors remain the responsibility of the user, particularly when working with small sample sizes.
#'
#' @usage
#' para_descriptors(dataset, sp_cols = NULL, group_vars = NULL,
#' decimal_places = 2, verbose = FALSE)
#'
#' @param dataset Data frame with parasitic abundance data.
#' @param sp_cols Vector with the names or indices of the species columns.
#' @param group_vars Vector with the names of the categorical variables to consider (e.g., 'Sex', 'Site').
#' @param decimal_places Number of decimal places to round the values.
#' @param verbose A logical value indicating if progress messages should be given.
#'
#' @return A data frame containing the calculated parasitological descriptors for each parasite taxon, either globally or by group (if grouping variables are specified). The following variables are returned:
#' \itemize{
#' \item \code{nH}: Number of hosts analyzed
#' \item \code{nH_inf}: Number of infected hosts
#' \item \code{A}: Total parasite abundance
#' \item \code{min}: Minimum parasite count
#' \item \code{max}: Maximum parasite count
#' \item \code{P}: Parasitic prevalence
#' \item \code{MeanA}: Mean parasitic abundance
#' \item \code{MeanA_sd}: Standard deviation of mean parasite abundance
#' \item \code{A_iqr}: Interquartile range of mean parasite abundance
#' \item \code{MedA}:Median parasite abundance
#' \item \code{MedA_sd}: Median absolute deviation of parasite abundance
#' \item \code{MeanI}: Mean parasite intensity
#' \item \code{MeanI_sd}: Standard deviation of mean parasite intensity
#' \item \code{I_iqr}: Interquartile range of mean parasite intensity
#' \item \code{MedI}: Median parasite intensity
#' \item \code{MedI_sd}: Median absolute deviation of parasite intensity
#' \item \code{Observation}: Qualitative descriptor indicating data availability and sample structure for each hierarchical combination:
#' \itemize{
#' \item \code{"Not analyzed"}: No valid observations are available for the given combination (all values are missing or the combination is absent in the dataset); therefore, no estimates can be computed.
#' \item \code{"One host analyzed"}: Only a single host analyzed is available for the given combination; thus, no population-level inference is possible and statical summary measures are not estimated.
#' \item \code{"No hosts infested"}: Hosts are present for the given combination, but none are infested (abundance = 0 for all observations); consequently, no statical summary measures of abundance or intensity can be estimated.
#' \item \code{"One host infested"}: Only a single infested host is recorded for the given combination; therefore, no sample-based estimation of intensity or related summary measures is possible.
#' \item \code{"Multiple hosts infested"}: More than one infested host is recorded for the given combination, allowing the estimation of summary measures.
#' }
#' }
#'
#' @references
#' Bush, A.O., Lafferty, K.D., Lotz, J.M., Shostak, A.W. (1997). Parasitology meets ecology on its own terms:
#' Margolis revisited. \emph{Journal of Parasitology}, 83(4), 575–583.
#'
#' Reiczigel, J., Marozzi, M., Fabian, I., Rózsa, L. (2019). Biostatistics for parasitologists – a primer to
#' quantitative parasitology. \emph{Trends in Parasitology}, 35(4), 277–281.
#'
#' @author Juan Manuel Cabrera, Exequiel Furlan and Elisa Helman
#'
#' @examples
#'
#'gral_descriptor <- para_descriptors(para_data$dataset,
#' sp_cols = c("Sp1", "Sp2", "Sp3", "Sp4"),
#' group_vars = c("Site","Sp_host"),
#' decimal_places = 2,
#' verbose = FALSE)
#'
#'gral_descriptor
#'
#' @export
para_descriptors <- function(dataset, sp_cols = NULL, group_vars = NULL, decimal_places = 2, verbose = FALSE)
{
Abund<-NA
nH_inf<-NA
nH<-NA
if (verbose) message("Checking function arguments...")
# Validaciones
if (is.null(sp_cols) || length(sp_cols) == 0) {
stop("The species columns must be specified (sp_cols).")
}
if (!all(sp_cols %in% colnames(dataset))) {
stop("Some of the specified species columns do not exist in the dataset.")
}
if (!is.null(group_vars) && !all(group_vars %in% colnames(dataset))) {
stop("Some of the specified categorical variables do not exist in the dataset.")
}
datos_long <- dataset %>%
tidyr::pivot_longer(cols = tidyr::all_of(sp_cols), names_to = "Species", values_to = "Abund")
group_by_vars <- c("Species", group_vars)
#Quitar combinaciones de sp_cols y group_vars con NA
datos_long_f <- datos_long %>% dplyr::filter(!is.na(Abund))
if(verbose) message("Checking for NA rows...")
diff<-nrow(datos_long) - nrow(datos_long_f)
if(diff!=0){
message(as.character(paste(diff, " rows were removed")))
suppressMessages(rows_removed<-dplyr::anti_join(datos_long, datos_long_f))
message("List of unique combinations removed from dataset")
print(as.data.frame(rows_removed %>% dplyr::distinct()))
datos_long<-datos_long_f
}
if (verbose) message("Calculating parasitological indices...")
t_para_index <- datos_long %>%
dplyr::group_by(dplyr::across(dplyr::all_of(group_by_vars))) %>%
dplyr::summarise(
nH = sum(!is.na(Abund), na.rm = TRUE),
nH_inf = sum(Abund > 0, na.rm = TRUE),
A = sum(Abund, na.rm = TRUE),
min = dplyr::if_else(nH > 0, round(min(Abund, na.rm = TRUE), decimal_places),NA_real_),
max = dplyr::if_else(nH > 0, round(max(Abund, na.rm = TRUE), decimal_places),NA_real_),
P = dplyr::if_else(nH > 1, round(nH_inf / nH, decimal_places), NA_real_),
MeanA = dplyr::if_else(nH > 1, round(mean(Abund, na.rm = TRUE), decimal_places), NA_real_),
MeanA_sd = dplyr::if_else(nH > 1, round(stats::sd(Abund, na.rm = TRUE), decimal_places), NA_real_),
A_iqr = dplyr::if_else(nH > 1, round(stats::IQR(Abund, na.rm = TRUE), decimal_places), NA_real_),
MedA = dplyr::if_else(nH > 1, round(stats::median(Abund, na.rm = TRUE), decimal_places), NA_real_),
MedA_sd = dplyr::if_else(nH > 1, round(stats::mad(Abund, na.rm = TRUE), decimal_places), NA_real_),
MeanI = dplyr::if_else(nH_inf >= 2, round(mean(Abund[Abund > 0], na.rm = TRUE), decimal_places), NA_real_),
MeanI_sd = dplyr::if_else(nH_inf >= 2, round(stats::sd(Abund[Abund > 0], na.rm = TRUE), decimal_places), NA_real_),
I_iqr = dplyr::if_else(nH_inf >= 2, round(stats::IQR(Abund[Abund > 0], na.rm = TRUE), decimal_places), NA_real_),
MedI = dplyr::if_else(nH_inf >= 2, round(stats::median(Abund[Abund > 0], na.rm = TRUE), decimal_places), NA_real_),
MedI_sd = dplyr::if_else(nH_inf >= 2, round(stats::mad(Abund[Abund > 0], na.rm = TRUE), decimal_places), NA_real_),
Observation = dplyr::case_when(
nH == 0 ~ "Not analyzed",
nH == 1 ~ "One host analyzed",
nH_inf == 0 ~ "No hosts infested",
nH_inf == 1 ~ "One host infested",
TRUE ~ "Multiple hosts infested"
),
.groups = "drop"
)
if (verbose) message("Calculation completed.")
return(t_para_index)
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.