R/format-funs.R

Defines functions zero_counts convert_counts compute_ci is_na_precision is_na_counts format_data

Documented in format_data

#' Format count series
#'
#' @description
#' This function provides an easy way to get count series ready to be analyzed 
#' by the package `popbayes`. It must be used prior to all other functions.
#' 
#' This function formats the count series (passed through the argument 
#' `data`) by selecting and renaming columns, checking columns format and 
#' content, and removing missing data (if `na_rm = TRUE`). It converts the 
#' original data frame into a list of count series that will be analyzed later
#' by the function [fit_trend()] to estimate population trends.
#' 
#' To be usable for the estimation of population trends, counts must be 
#' accompanied by information on precision. The population trend model requires 
#' a 95% confident interval (CI).
#' If estimates are total counts or guesstimates, this function will construct 
#' boundaries of the 95% CI by applying the rules set out in 
#' \url{https://frbcesab.github.io/popbayes/articles/popbayes.html}.
#' If counts were estimated by a sampling method the user needs to specify a 
#' measure of precision. Precision is preferably provided in the form of a 95% 
#' CI by means of two fields: `lower_ci` and `upper_ci`. It may also be given 
#' in the form of a standard deviation (`sd`), a variance (`var`), or a 
#' coefficient of variation (`cv`). If the fields `lower_ci` and `upper_ci` are 
#' both absent (or `NA`), fields `sd`, `var`, and `cv` are examined in this 
#' order. When one is found valid (no missing value), a 95% CI is derived 
#' assuming a normal distribution. 
#' The field `stat_method` must be present in `data` to indicate
#' if counts are **total counts** (`'T'`), **sampling** (`'S'`), or 
#' **guesstimate** (`'X'`).
#' 
#' If a series mixes aerial and ground counts, a field `field_method` must 
#' also be present and must contain either `'A'` (aerial counts), or `'G'` 
#' (ground counts). As all counts must eventually refer to the same field 
#' method for a correct estimation of trend, a conversion will be performed to  
#' homogenize counts. This conversion is based on a **preferred field method**
#' and a **conversion factor** both specific to a species/category. 
#' The preferred field method specifies the conversion direction. The 
#' conversion factor is the multiplicative factor that must be applied to an 
#' aerial count to get an equivalent ground count (note that if the preferred 
#' field method is `'A'`, ground counts will be divided by the conversion 
#' factor to get the equivalent aerial count).
#' 
#' The argument `rmax` represents the maximum change in log population size 
#' between two dates (i.e. the relative rate of increase). It will be used 
#' by [fit_trend()] but must be provided in this function.
#' 
#' These three parameters, named `pref_field_method`, `conversion_A2G`, and 
#' `rmax` can be present in `data` or in a second `data.frame` 
#' (passed through the argument `info`).
#' Alternatively, the package `popbayes` provides their values for some 
#' African large mammals.
#' 
#' **Note:** If the field `field_method` is absent in `data`, counts are 
#' assumed to be obtained with one field method.
#'
#'
#'
#' @param data a `data.frame` with at least five columns: `location`, 
#'   `species`, `date`, `count`, and `stat_method`.
#'   
#'   The `stat_method` field indicates the method used to estimate counts. It 
#'   can contain: `T` (total counts), `X` (guesstimate), and/or `S` (sampling). 
#' 
#'   If individual counts were estimated by **sampling**, additional column(s) 
#'   providing a measure of precision is also required (e.g. `lower_ci` and 
#'   `upper_ci`, or `sd`, `cv`, `var`). Precision metrics can be different 
#'   between counts. For instance, some sampling counts can have a `sd` value 
#'   and others `lower_ci` and `upper_ci`. In that case three columns are 
#'   required (`lower_ci`, `upper_ci`, and `sd`). See above section 
#'   **Description** for further information on the computation of the 95% 
#'   confident interval of estimates. 
#' 
#'   If the individuals were counted by different methods, an additional field 
#'   `field_method` is also required. It can contain: `G` (ground counts) 
#'   and/or `A` (aerial counts). See above section **Description** for further 
#'   information on the counts conversion.
#'   
#'   Others fields can be present either in `data` or `info` (see below).
#' 
#' @param info (optional) a `data.frame` with species in rows and the following
#'   columns: `species` (species name), `pref_field_method`, 
#'   `conversion_A2G`, and `rmax`. See above section **Description** for 
#'   further information on these fields.
#'   Default is `NULL` (i.e. these information must be present in `data` 
#'   if not available in `popbayes`).
#' 
#' @param location a `character` string. The column name in `data` of the
#'   site. This field is used to distinguish count series from different sites
#'   (if required) and to create an unique series name.
#'   Default is `'location'`.
#'   
#' @param species a `character` string. The column name in `data` (and 
#'   in `info` if provided) of the species. This field is used to distinguish 
#'   count series for different species (if required) and to create an unique 
#'   series name.
#'   Default is `'species'`.
#'   
#' @param date a `character` string. The column name in `data` of the date.
#'   This column `date` must be in a numerical form with possibly a decimal 
#'   part.
#'   Default is `'date'`.
#'   
#' @param count a `character` string. The column name in `data` of the
#'   number of individuals. This column must be numerical.
#'   Default is `'count'`.
#'  
#' @param stat_method a `character` string. The column name in `data` of 
#'   the method used to estimate individuals counts. It can contain `'T'` 
#'   (total counts), `'X'` (guesstimate), and/or `'S'` (sampling). If some 
#'   counts are coded as `'S'`, precision column(s) must also be provided (see 
#'   below).
#'   Default is `'stat_method'`. 
#' 
#' @param lower_ci (optional) a `character` string. The column name in `data`
#'   of the lower boundary of the 95% CI of the estimate (i.e. `count`). If 
#'   provided, the upper boundary of the 95% CI (argument `upper_ci`) must be 
#'   also provided. This argument is only required if some counts have been 
#'   estimated by a sampling method. But user may prefer use other precision 
#'   measures, e.g. standard deviation (argument `sd`), variance (argument 
#'   `var`), or coefficient of variation (argument `cv`). 
#'   Default is `'lower_ci'`.
#'   
#' @param upper_ci (optional) a `character` string. The column name in `data`
#'   of the upper boundary of the 95% CI of the estimate (i.e. `count`). If 
#'   provided, the lower boundary of the 95% CI (argument `lower_ci`) must be 
#'   also provided.
#'   Default is `'upper_ci'`.
#'   
#' @param sd (optional) a `character` string. The column name in `data` of 
#'   the standard deviation of the estimate.
#'   Default is `NULL`.
#'   
#' @param var (optional) a `character` string. The column name in `data` of 
#'   the variance of the estimate.
#'   Default is `NULL`.
#'    
#' @param cv (optional) a `character` string. The column name in `data` of 
#'   the coefficient of variation of the estimate.
#'   Default is `NULL`.
#'   
#' @param field_method (optional) a `character` string. The column name in 
#'   `data` of the field method used to count individuals. Counts can be ground 
#'   counts (coded as `'G'`) or aerial counts (coded as `'A'`). This argument 
#'   is optional if individuals have been counted by the same method. See above 
#'   section **Description** for further information on the count conversion.
#'   Default is `NULL`.
#'   
#' @param pref_field_method (optional) a `character` string. The column name
#'   in `data` of the preferred field method of the species. This argument is
#'   only required is `field_method` is not `NULL` (i.e. individuals have been 
#'   counted by different methods). Alternatively, this value can be passed in
#'   `info` (or internally retrieved if the species is listed in the package). 
#'   See above section **Description** for further information on the count
#'   conversion.
#'   Default is `NULL`.
#' 
#' @param conversion_A2G (optional) a `character` string. The column name
#'   in `data` of the count conversion factor of the species. This argument is
#'   only required if `field_method` is not `NULL` (i.e. individuals have been 
#'   counted by different methods). Alternatively this value can be passed in
#'   `info` (or internally retrieved if the species is listed in the package).
#'   See above section **Description** for further information on the count
#'   conversion.
#'   Default is `NULL`.
#'   
#' @param rmax (optional) a `character` string. The column name in `data` of 
#'   the species demographic potential (i.e. the relative rate of increase of 
#'   the population). This is the change in log population size between two 
#'   dates and will be used later by [fit_trend()].
#'   Default is `NULL`.
#'   
#' @param path a `character` string. The directory to save formatted data. 
#'   This directory must exist and can be an absolute or a relative path.
#'   Default is the current working directory.
#' 
#' @param na_rm a `logical.` If `TRUE`, counts with `NA` values will be 
#'   removed.
#'   Default is `FALSE` (returns an error to inform user if `NA` are detected).
#'
#'
#'
#' @return An n-elements `list` (where `n` is the number of count series). The
#'   name of each element of this list is a combination of location and 
#'   species. Each element of the list is a `list` with the following content:
#'   \itemize{
#'   \item \code{location} a `character` string. The name of the series site.
#'   \item \code{species} a `character` string. The name of the series species.
#'   \item \code{date} a `numerical` vector. The sequence of dates of the 
#'     series.
#'   \item \code{n_dates} an `integer.` The number of unique dates.
#'   \item \code{stat_methods} a `character` vector. The different stat methods
#'     of the series.
#'   \item \code{field_methods} (optional) a `character` vector. The different
#'     field methods of the series.
#'   \item \code{pref_field_method} (optional) a `character` string. The 
#'     preferred field method of the species (`'A'` or `'G'`).
#'   \item \code{conversion_A2G} (optional) a `numeric`. The conversion factor
#'     of the species used to convert counts to its preferred field method.
#'   \item \code{rmax} a `numeric`. The maximum population growth rate of the
#'     species.
#'   \item \code{data_original} a `data.frame`. Original data of the series 
#'     with renamed columns. Some rows may have been deleted 
#'     (if `na_rm = TRUE`).
#'   \item \code{data_converted} a `data.frame`. Data containing computed 
#'     boundaries of the 95% CI (`lower_ci_conv` and `upper_ci_conv`). If 
#'     counts have been obtained by different field methods, contains also 
#'     converted counts (`count_conv`) based on the preferred field method and
#'     conversion factor of the species. This `data.frame` will be used by the
#'     function [fit_trend()] to fit population models.
#'   }
#'   
#'   **Note:** Some original series can be discarded if one of these two 
#'   conditions is met: 1) the series contains only zero counts, and 2) the 
#'   series contains only a few dates (< 4 dates).
#' 
#' @export
#'
#' @examples
#' ## Load Garamba raw dataset ----
#' file_path <- system.file("extdata", "garamba_survey.csv", 
#'                          package = "popbayes")
#'                          
#' garamba <- read.csv(file = file_path)
#' 
#' ## Create temporary folder ----
#' temp_path <- tempdir()
#' 
#' ## Format dataset ----
#' garamba_formatted <- popbayes::format_data(
#'   data              = garamba, 
#'   path              = temp_path,
#'   field_method      = "field_method",
#'   pref_field_method = "pref_field_method",
#'   conversion_A2G    = "conversion_A2G",
#'   rmax              = "rmax")
#' 
#' ## Number of count series ----
#' length(garamba_formatted)
#' 
#' ## Retrieve count series names ----
#' popbayes::list_series(path = temp_path)
#' 
#' ## Print content of the first count series ----
#' names(garamba_formatted[[1]])
#' 
#' ## Print original data ----
#' garamba_formatted[[1]]$"data_original"
#' 
#' ## Print converted data ----
#' garamba_formatted[[1]]$"data_converted"

format_data <- function(data, info = NULL, date = "date", count = "count", 
                        location = "location", species = "species", 
                        stat_method = "stat_method", lower_ci = "lower_ci", 
                        upper_ci = "upper_ci", sd = NULL, var = NULL, 
                        cv = NULL, field_method = NULL, 
                        pref_field_method = NULL,
                        conversion_A2G = NULL, rmax = NULL, 
                        path = ".", na_rm = FALSE) {
  
  
  # CHECKS ----
  
  ## Check argument 'path' ----
  
  if (!is.character(path) || length(path) != 1) {
    stop("Argument 'path' must be a character string.")
  }
  
  if (!dir.exists(path)) {
    stop("The directory '", path, "' does not exist.")
  }
  
  
  ## Check argument 'data' ----
  
  if (missing(data)) {
    stop("Argument 'data' is required.")
  }
  
  if (!is.data.frame(data)) {
    stop("Argument 'data' must be a data.frame.")
  }
  
  
  ## Coerce to data.frame (tibble) ----
  
  data <- as.data.frame(data)
  
  
  ## Check argument 'location' ----
  
  if (!is.character(location) || length(location) != 1) {
    stop("Argument 'location' must be a column name (character string).")
  }
  
  if (!(location %in% colnames(data))) {
    stop("The column '", location, "' (argument location) is absent from ", 
         "'data'. Please check the spelling.")
  }
  
  data[ , location] <- as.character(data[ , location])
  
  if (any(is.na(data[ , location]))) {
    stop("The column '", location, "' cannot contain NA.")
  }
  
  
  ## Check argument 'species' ----
  
  if (!is.character(species) || length(species) != 1) {
    stop("Argument 'species' must be a column name (character string).")
  }
  
  if (!(species %in% colnames(data))) {
    stop("The column '", species, "' (argument species) is absent from ", 
         "'data'. Please check the spelling.")
  }
  
  data[ , species] <- as.character(data[ , species])
  
  if (any(is.na(data[ , species]))) {
    stop("The column '", species, "' cannot contain NA.")
  }
  
  
  ## Check argument 'date' ----
  
  if (!is.character(date) || length(date) != 1) {
    stop("Argument 'date' must be a column name (character string).")
  }
  
  if (!(date %in% colnames(data))) {
    stop("The column '", date, "' (argument date) is absent from 'data'. ", 
         "Please check the spelling.")
  }
  
  if (!is.numeric(data[ , date])) {
    stop("The column '", date, "' must be numeric.")
  }
  
  if (any(is.na(data[ , date]))) {
    stop("The column '", date, "' cannot contain NA.")
  }
  
  
  ## Check argument 'count' ----
  
  if (!is.character(count) || length(count) != 1) {
    stop("Argument 'count' must be a column name (character string).")
  }
  
  if (!(count %in% colnames(data))) {
    stop("The column '", count, "' (argument count) is absent from 'data'. ", 
         "Please check the spelling.")
  }
  
  if (!is.numeric(data[ , count])) {
    stop("The column '", count, "' must be numeric.")
  }
  
  if (length(which(data[ , count] < 0))) {
    stop("The column '", count, "' must be positive (or zero).")
  }
  
  
  ## Check argument 'na_rm' ----
  
  if (!is.logical(na_rm) || length(na_rm) != 1) {
    stop("Argument 'na_rm' must be TRUE or FALSE.")
  }
  
  
  ## Check argument 'stat_method' ----
  
  if (!is.character(stat_method) || length(stat_method) != 1) {
    stop("Argument 'stat_method' must be a column name (character string).")
  }
  
  if (!(stat_method %in% colnames(data))) {
    stop("The column '", stat_method, "' (argument stat_method) is absent ", 
         "from 'data'. Please check the spelling.")
  }
  
  valid_stat_methods <- c("T", "X", "S")
  
  valid_stat_methods_msg <- paste0(valid_stat_methods, collapse = "' or '")
  valid_stat_methods_msg <- paste0("'", valid_stat_methods_msg, "'")
  
  data[ , stat_method] <- as.character(data[ , stat_method])
  
  if (any(is.na(data[ , stat_method]))) {
    stop("The column '", stat_method, "' cannot contain NA.")
  }
  
  if (any(!(data[ , stat_method] %in% valid_stat_methods))) {
    stop("Invalid value(s) for 'stat_method' in 'data'. ",
         "Allowed values are: ", valid_stat_methods_msg, ".")
  }  
  
  
  ## Check argument 'lower_ci' ----
  
  if (!is.null(lower_ci)) {
    
    if (!is.character(lower_ci) || length(lower_ci) != 1) {
      stop("Argument 'lower_ci' must be a column name (character string).")
    }
    
    if (!(lower_ci %in% colnames(data))) {
      stop("The column '", lower_ci, "' (argument lower_ci) is absent ", 
           "from 'data'. Please check the spelling.")
    }
    
    if (!is.numeric(data[ , lower_ci])) {
      stop("The column '", lower_ci, "' must be numeric.")
    }
    
    if (length(which(data[ , lower_ci] < 0))) {
      stop("The column '", lower_ci, "' must be positive (or zero).")
    }
  }
  
  
  ## Check argument 'upper_ci' ----
  
  if (!is.null(upper_ci)) {
    
    if (!is.character(upper_ci) || length(upper_ci) != 1) {
      stop("Argument 'upper_ci' must be a column name (character string).")
    }
    
    if (!(upper_ci %in% colnames(data))) {
      stop("The column '", upper_ci, "' (argument upper_ci) is absent ", 
           "from 'data'. Please check the spelling.")
    }
    
    if (!is.numeric(data[ , upper_ci])) {
      stop("The column '", upper_ci, "' must be numeric.")
    }
    
    if (length(which(data[ , upper_ci] < 0))) {
      stop("The column '", upper_ci, "' must be positive (or zero).")
    }
  }
  
  
  ## Check argument 'sd' ----
  
  if (!is.null(sd)) {
    
    if (!is.character(sd) || length(sd) != 1) {
      stop("Argument 'sd' must be a column name (character string).")
    }
    
    if (!(sd %in% colnames(data))) {
      stop("The column '", sd, "' (argument sd) is absent from 'data'. ",
           "Please check the spelling.")
    }
    
    if (!is.numeric(data[ , sd])) {
      stop("The column '", sd, "' must be numeric.")
    }
    
    if (length(which(data[ , sd] <= 0))) {
      stop("The column '", sd, "' must be strictly positive.")
    }
  }
  
  
  ## Check argument 'var' ----
  
  if (!is.null(var)) {
    
    if (!is.character(var) || length(var) != 1) {
      stop("Argument 'var' must be a column name (character string).")
    }
    
    if (!(var %in% colnames(data))) {
      stop("The column '", var, "' (argument var) is absent from 'data'. ",
           "Please check the spelling.")
    }
    
    if (!is.numeric(data[ , var])) {
      stop("The column '", var, "' must be numeric.")
    }
    
    if (length(which(data[ , var] <= 0))) {
      stop("The column '", var, "' must be strictly positive.")
    }
  } 
  
  
  ## Check argument 'cv' ----
  
  if (!is.null(cv)) {
    
    if (!is.character(cv) || length(cv) != 1) {
      stop("Argument 'cv' must be a column name (character string).")
    }
    
    if (!(cv %in% colnames(data))) {
      stop("The column '", cv, "' (argument cv) is absent from 'data'. ",
           "Please check the spelling.")
    }
    
    if (!is.numeric(data[ , cv])) {
      stop("The column '", cv, "' must be numeric.")
    }
    
    if (length(which(data[ , cv] <= 0))) {
      stop("The column '", cv, "' must be strictly positive.")
    }
  } 
  
  
  ## Check argument 'field_method' ----
  
  if (!is.null(field_method)) {
    
    if (!is.character(field_method) || length(field_method) != 1) {
      stop("Argument 'field_method' must be a column name (character of ",
           "length 1).")
    }
    
    if (!(field_method %in% colnames(data))) {
      stop("The column '", field_method, "' (argument field_method) is ", 
           "absent from 'data'. Please check the spelling.")
    }
    
    valid_field_methods <- c("G", "A")
    
    valid_field_methods_msg <- paste0(valid_field_methods, collapse = "' or '")
    valid_field_methods_msg <- paste0("'", valid_field_methods_msg, "'")
    
    data[ , field_method] <- as.character(data[ , field_method])
    
    data_not_x <- data[data[ , stat_method] != "X", ]
    
    if (nrow(data_not_x)) {
      
      if (any(is.na(data_not_x[ , field_method]))) {
        stop("The column '", field_method, "' cannot contain NA ", 
             "(except for count with stat_method = 'X').")
      } 
      
      if (any(!(data_not_x[ , field_method] %in% valid_field_methods))) {
        stop("Invalid value(s) for 'field_method' in 'data'. ",
             "Allowed values are: ", valid_field_methods_msg, ".")
      }
    }
  }
  
  
  ## Check argument 'pref_field_method' ----
  
  if (!is.null(pref_field_method)) {
    
    if (!is.character(pref_field_method) || length(pref_field_method) != 1) {
      stop("Argument 'pref_field_method' must be a column name (character of ",
           "length 1).")
    }
    
    if (!(pref_field_method %in% colnames(data))) {
      stop("The column '", pref_field_method, "' (argument pref_field_method) ",
           "is absent from 'data'. Please check the spelling.")
    }
    
    valid_field_methods <- c("G", "A")
    
    valid_field_methods_msg <- paste0(valid_field_methods, collapse = "' or '")
    valid_field_methods_msg <- paste0("'", valid_field_methods_msg, "'")
    
    data[ , pref_field_method] <- as.character(data[ , pref_field_method])
    
    data_not_na <- data[!is.na(data[ , pref_field_method]), ]
    
    if (nrow(data_not_na)) {
      
      if (any(!(data_not_na[ , pref_field_method] %in% valid_field_methods))) {
        stop("Invalid value(s) for 'field_method' in 'data'. ",
             "Allowed values are: ", valid_field_methods_msg, ".")
      } 
    }
  }
  
  
  ## Check argument 'conversion_A2G' ----
  
  if (!is.null(conversion_A2G)) {
    
    if (!is.character(conversion_A2G) || length(conversion_A2G) != 1) {
      stop("Argument 'conversion_A2G' must be a column name (character of ",
           "length 1).")
    }
    
    if (!(conversion_A2G %in% colnames(data))) {
      stop("The column '", conversion_A2G, "' (argument conversion_A2G) ", 
           "is absent from 'data'. Please check the spelling.")
    }
    
    if (!is.numeric(data[ , conversion_A2G])) {
      stop("The column '", conversion_A2G, "' must be a numeric.")
    }
    
    if (length(which(data[ , conversion_A2G] <= 0))) {
      stop("The column '", conversion_A2G, "' must be strictly positive.")
    }
  }
  
  
  ## Check argument 'rmax' ----
  
  if (!is.null(rmax)) {
    
    if (!is.character(rmax) || length(rmax) != 1) {
      stop("Argument 'rmax' must be a column name (character of ",
           "length 1).")
    }
    
    if (!(rmax %in% colnames(data))) {
      stop("The column '", rmax, "' (argument rmax) ", 
           "is absent from 'data'. Please check the spelling.")
    }
    
    if (!is.numeric(data[ , rmax])) {
      stop("The column '", rmax, "' must be a numeric.")
    }
    
    if (length(which(data[ , rmax] <= 0))) {
      stop("The column '", rmax, "' must be strictly positive.")
    }
  }
  
  
  ## Check argument 'info' ----
  
  if (!is.null(info)) {
    
    if (!is.data.frame(info)) {
      stop("Argument 'info' must be a data frame.")
    }
    
    valid_info_colnames <- c("species", "pref_field_method", "conversion_A2G",
                             "rmax")
    
    valid_info_colnames_msg <- paste0(valid_info_colnames, collapse = "' and '")
    valid_info_colnames_msg <- paste0("'", valid_info_colnames_msg, "'")
    
    
    if (any(!(valid_info_colnames %in% colnames(info)))) {
      stop("Invalid columns in 'info'. ",
           "Required variables are: ", valid_info_colnames_msg, ".")
    }
    
    if (any(is.na(info[ , valid_info_colnames]))) {
      stop("The dataset 'info' cannot contain NA.")
    }
    
    if (length(which(duplicated(info$"species"))) > 0) {
      stop("The 'info' data.frame cannot contain duplicated species.")
    }
  }
  
  
  ## Check precision information ----
  
  if ("S" %in% data[ , stat_method]) {
    
    if (is.null(lower_ci) && is.null(upper_ci) && is.null(sd) && is.null(cv) &&
        is.null(var)) {
      
      stop("No valid measure of precision is available for sampling counts. ", 
           "Add 'lower_ci' and 'upper_ci' and/or 'sd', 'var', 'cv' ", 
           "information.")
    }
    
    if (!is.null(lower_ci) && is.null(upper_ci)) {
      stop("You must provide both lower and upper CI.")
    }
    
    if (is.null(lower_ci) && !is.null(upper_ci)) {
      stop("You must provide both lower and upper CI.")
    }
  }
  
 
  # NA COUNTS ----
  
  ## Handle NA in counts ----
  
  data <- is_na_counts(data, count, na_rm)
  
  if (nrow(data) == 0) {
    stop("All counts are NA. Please check your data.")
  }
  
  
  # FORMAT DATA ----
  
  ## Detect precision(s) columns ----
  
  precision_cols <- NULL
  
  if ("S" %in% data[ , stat_method]) {
    
    if (!is.null(lower_ci)) {
      
      precision_cols <- c(precision_cols, lower_ci)
      names(precision_cols)[length(precision_cols)] <- "lower_ci"
    }
    
    if (!is.null(upper_ci)) {
      
      precision_cols <- c(precision_cols, upper_ci)
      names(precision_cols)[length(precision_cols)] <- "upper_ci"
    }
    
    if (!is.null(sd)) {
      
      precision_cols <- c(precision_cols, sd)
      names(precision_cols)[length(precision_cols)] <- "sd"
    }
    
    if (!is.null(var)) {
      
      precision_cols <- c(precision_cols, var)
      names(precision_cols)[length(precision_cols)] <- "var"
    }
    
    if (!is.null(cv)) {
      
      precision_cols <- c(precision_cols, cv)
      names(precision_cols)[length(precision_cols)] <- "cv"
    }
  }
  
  
  ## Rename columns ----
  
  data_renamed <- data[ , c(location, species, date, stat_method)]
  colnames(data_renamed) <- c("location", "species", "date", "stat_method")
  
  if (!is.null(field_method)) {
    data_renamed <- data.frame(data_renamed, 
                               "field_method" = data[ , field_method])
  }
  
  if (!is.null(pref_field_method)) {
    data_renamed <- data.frame(data_renamed, 
                               "pref_field_method" = data[ , pref_field_method])
  }
  
  if (!is.null(conversion_A2G)) {
    data_renamed <- data.frame(data_renamed, 
                               "conversion_A2G" = data[ , conversion_A2G])
  }
  
  if (!is.null(rmax)) {
    data_renamed <- data.frame(data_renamed, 
                               "rmax" = data[ , rmax])
  }
  
  data_renamed <- data.frame(data_renamed, 
                             "count_orig" = data[ , count])
  
  if (!is.null(precision_cols)) {
    
    precision_data <- data[ , precision_cols]
    precision_cols <- paste0(names(precision_cols), "_orig")
    colnames(precision_data) <- precision_cols
    
    data_renamed <- data.frame(data_renamed, precision_data)
  }
  
  
  # CREATE SERIES ----
  
  ## Detect series ----
  
  series_infos <- get_series(data_renamed, quiet = TRUE)
  
  
  ## Split data by series ----
  
  count_series <- list()
  
  for (i in seq_len(nrow(series_infos))) {
    
    id <- series_infos[i, "id"]
    
    sel_rows <- which(data_renamed$"location" == series_infos[i, "location"] &
                        data_renamed$"species"  == series_infos[i, "species"])
    
    series <- data_renamed[sel_rows, ]
    rownames(series) <- NULL
    
    count_series[[id]] <- series
  }
  
  
  # SPECIES INFO ----
  
  for (i in seq_along(count_series)) {
    
    if (!is.null(field_method)) {
      
      
      ## Retrieve 'pref_field_method' ----
      
      if (!is.null(pref_field_method)) {
        
        pref_field_method_data <- unique(count_series[[i]]$"pref_field_method")
        pref_field_method_data <- pref_field_method_data[
          !is.na(pref_field_method_data)]
        
        if (length(pref_field_method_data) > 1) {
          stop("Multiple values for 'pref_field_method' detected in ", 
               names(count_series)[i], ".")
        }

        if (length(pref_field_method_data) == 0) {
          
          if (!is.null(info)) {
            
            if (unique(count_series[[i]]$"species") %in% info$"species") {
              
              pos <- info$"species" %in% count_series[[i]]$"species"[1]
              pref_field_method_data <- info[pos, "pref_field_method"]
              
            } else {
              
              if (unique(count_series[[i]]$"species") %in% 
                  species_info$"species") {
                
                pos <- species_info$"species" %in% 
                  count_series[[i]]$"species"[1]
                pref_field_method_data <- species_info[pos, "pref_field_method"]
                
              } else {
                
                stop("Unable to retrieve 'pref_field_method' for the series ", 
                     names(count_series)[i])
              }
            }
            
          } else {
            
            if (unique(count_series[[i]]$"species") %in% 
                species_info$"species") {
              
              pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
              pref_field_method_data <- species_info[pos, "pref_field_method"]
              
            } else {
              
              stop("Unable to retrieve 'pref_field_method' for the series ", 
                   names(count_series)[i])
            }
          }
        }
        
      } else {
        
        if (!is.null(info)) {
          
          if (unique(count_series[[i]]$"species") %in% info$"species") {
            
            pos <- info$"species" %in% count_series[[i]]$"species"[1]
            pref_field_method_data <- info[pos, "pref_field_method"]
            
          } else {
            
            if (unique(count_series[[i]]$"species") %in% 
                species_info$"species") {
              
              pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
              pref_field_method_data <- species_info[pos, "pref_field_method"]
              
            } else {
              
              stop("Unable to retrieve 'pref_field_method' for the series ", 
                   names(count_series)[i])
            }
          }
          
        } else {
          
          if (unique(count_series[[i]]$"species") %in% 
              species_info$"species") {
            
            pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
            pref_field_method_data <- species_info[pos, "pref_field_method"]
            
          } else {
            
            stop("Unable to retrieve 'pref_field_method' for the series ", 
                 names(count_series)[i])
          }
        }
      }
      
      count_series[[i]]$"pref_field_method" <- pref_field_method_data
      
      
      ## Retrieve 'conversion_A2G' ----
      
      if (!is.null(conversion_A2G)) {
      
        conversion_A2G_data <- unique(count_series[[i]]$"conversion_A2G")
        conversion_A2G_data <- conversion_A2G_data[
          !is.na(conversion_A2G_data)]
        
        if (length(conversion_A2G_data) > 1) {
          stop("Multiple values for 'conversion_A2G' detected in ", 
               names(count_series)[i], ".")
        }

        if (length(conversion_A2G_data) == 0) {
          
          if (!is.null(info)) {
            
            if (unique(count_series[[i]]$"species") %in% info$"species") {
              
              pos <- info$"species" %in% count_series[[i]]$"species"[1]
              conversion_A2G_data <- info[pos, "conversion_A2G"]
              
            } else {
              
              if (unique(count_series[[i]]$"species") %in% 
                  species_info$"species") {
                
                pos <- species_info$"species" %in% 
                  count_series[[i]]$"species"[1]
                conversion_A2G_data <- species_info[pos, "conversion_A2G"]
                
              } else {
                
                stop("Unable to retrieve 'conversion_A2G' for the series ", 
                     names(count_series)[i])
              }
            }
            
          } else {
            
            if (unique(count_series[[i]]$"species") %in% 
                species_info$"species") {
              
              pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
              conversion_A2G_data <- species_info[pos, "conversion_A2G"]
              
            } else {
              
              stop("Unable to retrieve 'conversion_A2G' for the series ", 
                   names(count_series)[i])
            }
          }
        }
        
      } else {
        
        if (!is.null(info)) {
          
          if (unique(count_series[[i]]$"species") %in% info$"species") {
            
            pos <- info$"species" %in% count_series[[i]]$"species"[1]
            conversion_A2G_data <- info[pos, "conversion_A2G"]
            
          } else {
            
            if (unique(count_series[[i]]$"species") %in% 
                species_info$"species") {
              
              pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
              conversion_A2G_data <- species_info[pos, "conversion_A2G"]
              
            } else {
              
              stop("Unable to retrieve 'conversion_A2G' for the series ", 
                   names(count_series)[i])
            }
          }
          
        } else {
          
          if (unique(count_series[[i]]$"species") %in% 
              species_info$"species") {
            
            pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
            conversion_A2G_data <- species_info[pos, "conversion_A2G"]
            
          } else {
            
            stop("Unable to retrieve 'conversion_A2G' for the series ", 
                 names(count_series)[i])
          }
        }
      }
      
      count_series[[i]]$"conversion_A2G"    <- conversion_A2G_data
    }
    
    
    ## Retrieve 'rmax' ----
    
    if (!is.null(rmax)) {
      
      rmax_data <- unique(count_series[[i]]$"rmax")
      rmax_data <- rmax_data[!is.na(rmax_data)]
      
      if (length(rmax_data) > 1) {
        stop("Multiple values for 'rmax' detected in ", 
             names(count_series)[i], ".")
      }
      
      
      if (length(rmax_data) == 0) {
        
        if (!is.null(info)) {
          
          if (unique(count_series[[i]]$"species") %in% info$"species") {
            
            pos <- info$"species" %in% count_series[[i]]$"species"[1]
            rmax_data <- info[pos, "rmax"]
            
          } else {
            
            if (unique(count_series[[i]]$"species") %in% 
                species_info$"species") {
              
              pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
              rmax_data <- species_info[pos, "rmax"]
              
            } else {
              
              stop("Unable to retrieve 'rmax' for the series ", 
                   names(count_series)[i])
            }
          }
          
        } else {
          
          if (unique(count_series[[i]]$"species") %in% 
              species_info$"species") {
            
            pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
            rmax_data <- species_info[pos, "rmax"]
            
          } else {
            
            stop("Unable to retrieve 'rmax' for the series ", 
                 names(count_series)[i])
          }
        }
      }
      
    } else {
      
      if (!is.null(info)) {
        
        if (unique(count_series[[i]]$"species") %in% info$"species") {
          
          pos <- info$"species" %in% count_series[[i]]$"species"[1]
          rmax_data <- info[pos, "rmax"]
          
        } else {
          
          if (unique(count_series[[i]]$"species") %in% 
              species_info$"species") {
            
            pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
            rmax_data <- species_info[pos, "rmax"]
            
          } else {
            
            stop("Unable to retrieve 'rmax' for the series ", 
                 names(count_series)[i])
          }
        }
        
      } else {
        
        if (unique(count_series[[i]]$"species") %in% 
            species_info$"species") {
          
          pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
          rmax_data <- species_info[pos, "rmax"]
          
        } else {
          
          stop("Unable to retrieve 'rmax' for the series ", 
               names(count_series)[i])
        }
      }
    }
    
    count_series[[i]]$"rmax" <- rmax_data
  }
  
  
  # CHECK PRECISION MEASURES ----
  
  for (i in seq_along(count_series)) {
    
    count_series[[i]] <- is_na_precision(count_series[[i]], precision_cols, 
                                         na_rm)
  }
  
  
  ## Remove series ----
  
  n_rows <- unlist(lapply(count_series, function(x) nrow(x)))
  n_rows <- n_rows[n_rows == 0]
  
  if (length(n_rows)) {
    
    series_to_del <- names(n_rows)
    count_series <- count_series[-which(names(count_series) %in% series_to_del)]
    
    if (length(n_rows) < length(count_series)) {
      usethis::ui_oops(paste0("Removing the following series without valid ", 
                              "precision measures: ", 
                              usethis::ui_value(paste0(series_to_del, 
                                                       collapse = ", "))))
    }
  }
  
  
  n_rows <- unlist(lapply(count_series, function(x) nrow(x)))
  n_rows <- n_rows[n_rows < 4]
  
  if (length(n_rows)) {
    
    series_to_del <- names(n_rows)
    
    if (!na_rm) {
      
      stop("The following count series have not enough data (< 4):\n", 
           usethis::ui_value(paste0(series_to_del, 
                                    collapse = ", ")),
           "\nRemove these count series or use 'na_rm = TRUE'.")
      
    } else {
      
      count_series <- count_series[-which(names(count_series) %in% 
                                            series_to_del)]  
    }
    
    if (length(n_rows) < length(count_series)) {
      usethis::ui_oops(paste0("Removing the following series without enough ", 
                              "data (< 4) ", 
                              usethis::ui_value(paste0(series_to_del, 
                                                       collapse = ", "))))
    }
  }
  
  if (length(count_series) == 0) {
    stop("All your series have been removed. Please check your data.")
  }
  
  
  # COMPUTE CI BOUNDARIES ----
  
  for (i in seq_along(count_series)) {
    
    count_series[[i]] <- compute_ci(count_series[[i]], precision_cols)
  }
  
  
  # CONVERT COUNTS ----
  
  for (i in seq_along(count_series)) {
  
    count_series[[i]] <- convert_counts(count_series[[i]], field_method)
  }
  
  
  # ZERO COUNTS ----
  
  for (i in seq_along(count_series)) {
    
    count_series[[i]] <- zero_counts(count_series[[i]], na_rm)
  }
  
  n_rows <- unlist(lapply(count_series, function(x) nrow(x)))
  n_rows <- n_rows[n_rows == 0]
  
  if (length(n_rows)) {
    
    series_to_del <- names(n_rows)
    count_series <- count_series[-which(names(count_series) %in% series_to_del)]
    
    if (length(n_rows) < length(count_series)) {
      usethis::ui_oops(paste0("Removing the following series without valid ", 
                              "precision measures: ", 
                              usethis::ui_value(paste0(series_to_del, 
                                                       collapse = ", "))))
    }
  }
  
  if (length(count_series) == 0) {
    stop("All your series have been removed. Please check your data.")
  }
  
  
  # FINAL CHECKS ----
  
  for (i in seq_along(count_series)) {
    
    if (any(count_series[[i]][ , "lower_ci_conv"] ==
            count_series[[i]][ , "upper_ci_conv"])) {
      stop(paste0("Lower and upper CI bounds cannot be strictly equal for ", 
                  "the series.", 
                  usethis::ui_value(names(count_series)[i])))
    }
  }
  
  
  # CREATE FINAL LIST ----
  
  data_series    <- list()
  
  for (i in seq_len(nrow(series_infos))) {
    
    id <- names(count_series)[i]
    
    data_sub <- count_series[[i]][order(count_series[[i]]$"date",
                                        decreasing = FALSE), ]
    rownames(data_sub) <- NULL
    
    if (!is.null(field_method)) {
      
      pref_field_method <- unique(data_sub$"pref_field_method")
      conversion_A2G    <- unique(data_sub$"conversion_A2G")
      
      field_methods <- sort(unique(data_sub[ , "field_method"]))
      field_methods <- field_methods[!is.na(field_methods)]
      
    } else {
      
      pref_field_method <- NULL
      conversion_A2G    <- NULL
      field_methods     <- NULL
    }
      
    rmax <- unique(data_sub$"rmax")
      
      
    data_series[[id]] <- list(
      "location"          = unique(data_sub$"location"),
      "species"           = unique(data_sub$"species"),
      "dates"             = data_sub[ , "date"],
      "n_dates"           = length(unique(data_sub[ , "date"])),
      "stat_methods"      = sort(unique(data_sub[ , "stat_method"])),
      "field_methods"     = field_methods,
      "pref_field_method" = pref_field_method,
      "conversion_A2G"    = conversion_A2G,
      "rmax"              = rmax,
      "data_original"     = data_sub[ , -grep("_conv", colnames(data_sub))],
      "data_converted"    = data_sub[ , -grep("_orig", colnames(data_sub))])
    
    
    ## Export sub-list ----
    
    species_path <- file.path(path, id)
    dir.create(species_path, showWarnings = FALSE)
    
    formatted_data <- data_series[id]
    save(formatted_data, file = file.path(species_path,
                                          paste0(id, "_data.RData")))
  }
  
    
  usethis::ui_done(paste0("Detecting {usethis::ui_value(length(", 
                          "data_series))} count series"))
  
  data_series
}



#' Remove rows or return error if NA counts detected
#'
#' @param data a `data.frame`
#' @param col a `character` string (column to inspect)
#' @param na_rm a `logical`. If `TRUE`, deletes rows. Otherwise, returns an 
#'   error.
#'
#' @return A `data.frame` (same as `data` with possibly some rows removed).
#' 
#' @noRd

is_na_counts <- function(data, col, na_rm) {
  
  if (any(is.na(data[ , col]))) {
    
    if (!na_rm) {
      
      stop("The column '", col, "' cannot contain NA. If you want to ", 
           "remove missing counts, please use 'na_rm = TRUE'.", call. = FALSE)
      
    } else {
      
      pos <- which(is.na(data[ , col]))
      usethis::ui_info(paste0("Removing {usethis::ui_value(length(pos))} ",
                              "rows with NA values in '", col, "' field."))
      
      data <- data[-pos, ]
    }
  }
  
  data
}



#' Check precision data for sampling counts
#'
#' - Check for missing precision values: if neither ci, sd, var, cv are found, 
#'   returns an error (`na_rm = FALSE`) or deletes rows (`na_rm = TRUE`).
#' - Check for CI boundaries: lower and upper boundaries are required. If not, 
#'   returns an error.
#' - Check CI boundaries values: if lower_ci > count or upper_ci < count,
#'   returns an error.
#'
#' @param data a `data.frame`
#' @param precision_cols a `character` vector (column names of precision 
#'   information, i.e. `'lower_ci_orig'`, `'sd_orig'`, etc.)
#' @param na_rm a `logical`. If `TRUE` delete rows. Otherwise return an error.
#'
#' @return A data frame (same as `data`).
#' 
#' @noRd

is_na_precision <- function(data, precision_cols, na_rm) {
  
  
  ## Check for missing precision values ----
  
  sampling_rows <- which(data[ , "stat_method"] == "S")
  
  if (length(sampling_rows)) {
    
    is_na <- apply(data[sampling_rows, precision_cols], 1, function(x) {
      x <- sum(ifelse(is.na(x), 0, 1))
      ifelse(x == 0, TRUE, FALSE)
    })
  
  
    ## Remove or stop if missing precision values for S ----
    
    if (sum(is_na)) {
      
      if (!na_rm) {
        
        stop("Precision column(s) cannot all be NA for sampling counts. If ", 
             "you want to remove counts missing precision information, please ",
             "use 'na_rm = TRUE'.")
        
      } else {
        
        data <- data[-sampling_rows[which(is_na)], ]
      }
    }
  }
  
  if (nrow(data)) {
    
    
    ## Check for CI bounds (require both) for S ----
    
    sampling_rows <- which(data[ , "stat_method"] == "S")
    
    if (length(sampling_rows)) {
      
      if ("lower_ci_orig" %in% colnames(data)) {
        
        is_na_lower <- is.na(data[sampling_rows, "lower_ci_orig"])
        is_na_upper <- is.na(data[sampling_rows, "upper_ci_orig"])
        
        pos <- which((is_na_upper + is_na_lower) == 1)
        
        if (length(pos)) {
          
          ci_cols <- which(precision_cols %in% c("lower_ci_orig", 
                                                 "upper_ci_orig"))
          
          tmp <- as.data.frame(data[sampling_rows[pos], 
                                    precision_cols[-ci_cols]])
          
          if (ncol(tmp)) {
            
            is_na <- apply(tmp, 1, function(x) {
              x <- sum(ifelse(is.na(x), 0, 1))
              ifelse(x == 0, TRUE, FALSE)
            })
            
            if (sum(is_na)) {
              stop("Unless another type of precision information is provided, ",
                   "both lower and upper CI bounds are required.")
            }
            
          } else {
            
            stop("Unless another type of precision information is provided, ", 
                 "both lower and upper CI bounds are required.")
          }
        }
      }
    }
   
    
    ## Check CI bounds values for S ----
    
    if (length(sampling_rows)) {
      
      if ("lower_ci_orig" %in% colnames(data)) {
        
        pos <- which(data[sampling_rows, "lower_ci_orig"] > 
                       data[sampling_rows, "count_orig"])
        
        if (length(pos)) {
          stop("At least one CI lower bound is greater than the corresponding ",
               "count.")
        }
        
        
        pos <- which(data[sampling_rows, "lower_ci_orig"] < 0)
        
        if (length(pos)) {
          stop("CI lower bounds must be positive.")
        }
      }
      
      
      if ("upper_ci_orig" %in% colnames(data)) {
        
        pos <- which(data[sampling_rows, "upper_ci_orig"] < 
                       data[sampling_rows, "count_orig"])
        
        if (length(pos)) {
          stop("At least one CI upper bound is smaller than the corresponding ",
               "count.")
        }
        
        
        pos <- which(data[sampling_rows, "upper_ci_orig"] < 0)
        
        if (length(pos)) {
          stop("Upper CI values must be positive.")
        }
      }
    }
  }
  
  data
}



#' Compute 95% confident interval boundaries
#'
#' - Always computes CI bounds for stat methods T and G
#' - Derives CI bounds for stat method S from SD, VAR, or CV, unless CI bounds 
#'   are provided
#'
#' @param data a `data.frame`
#' @param precision_cols a `character` vector (names of columns with precision 
#'   information, i.e. `'lower_ci_orig'`, `'sd_orig'`, etc.)
#'
#' @return A `data.frame` identical to `data` with two additional columns: 
#'   `lower_ci_conv` and `upper_ci_conv`.
#' 
#' @noRd

compute_ci <- function(data, precision_cols) {
  
  data$"count_conv"    <- data$"count_orig"
  data$"lower_ci_conv" <- NA
  data$"upper_ci_conv" <- NA
  
  
  ## Compute CI boundaries for Total counts ----
  
  pos <- which(data$"stat_method" == "T")
  
  if (length(pos)) {
    data[pos, "lower_ci_conv"] <- data[pos, "count_orig"] * 0.95
    data[pos, "upper_ci_conv"] <- data[pos, "count_orig"] * 1.20
  }
  
  
  ## Compute CI boundaries for Guesstimates ----
  
  pos <- which(data$"stat_method" == "X")
  
  if (length(pos)) {
    data[pos, "lower_ci_conv"] <- data[pos, "count_orig"] * 0.80
    data[pos, "upper_ci_conv"] <- data[pos, "count_orig"] * 1.20
  }
  
  
  ## Compute CI boundaries for Sampling counts ----
  
  pos <- which(data$"stat_method" == "S")
  
  if (length(pos)) {
    
    for (i in pos) {
      
      found <- 0
      
      if (found == 0) {
        if ("lower_ci_orig" %in% precision_cols) {
          if (!is.na(data[i, "lower_ci_orig"])) {          # CI bounds
            data[i, "lower_ci_conv"] <- data[i, "lower_ci_orig"]
            data[i, "upper_ci_conv"] <- data[i, "upper_ci_orig"]
            found <- 1
          }
        }
      }
      
      if (found == 0) {
        if ("sd_orig" %in% precision_cols) {
          if (!is.na(data[i, "sd_orig"])) {                # Standard deviation
            data[i, "lower_ci_conv"] <- 
              data[i, "count_orig"] - 1.96 * data[i, "sd_orig"]
            data[i, "upper_ci_conv"] <- 
              data[i, "count_orig"] + 1.96 * data[i, "sd_orig"]
            found <- 1
          }
        }
      }
      
      if (found == 0) {
        if ("var_orig" %in% precision_cols) {
          if (!is.na(data[i, "var_orig"])) {               # Variance
            data[i, "lower_ci_conv"] <- 
              data[i, "count_orig"] - 1.96 * sqrt(data[i, "var_orig"])
            data[i, "upper_ci_conv"] <- 
              data[i, "count_orig"] + 1.96 * sqrt(data[i, "var_orig"])
            found <- 1
          }
        }
      }
      
      if (found == 0) {
        if ("cv_orig" %in% precision_cols) {
          if (!is.na(data[i, "cv_orig"])) {                # Coeff of variation
            data[i, "lower_ci_conv"] <- 
              data[i, "count_orig"] * (1 - 1.96 * data[i, "cv_orig"])
            data[i, "upper_ci_conv"] <- 
              data[i, "count_orig"] * (1 + 1.96 * data[i, "cv_orig"])
            found <- 1
          }
        }
      }
    }
  }
  
  data
}



#' Convert counts
#'
#' This function converts counts (and 95% CI lower and upper bounds) based on
#' the preferred field method and the species conversion factor.
#' 
#' **Important:** if the preferred field method is provided and the 
#' `field_method` column is present in `data`, counts are always converted 
#' toward the preferred field method.
#'
#' @param data a `data.frame`. Counts dataset.
#' 
#' @param field_method a `character` string. The column name in `data`.
#'
#' @return A `data.frame` (same as `data`).
#' 
#' @noRd

convert_counts <- function(data, field_method) {
  
  if (!is.null(field_method)) {
    
    method_pref <- unique(data$"pref_field_method")
    conv_fact   <- unique(data$"conversion_A2G")
    
    methods_used <- data$"field_method"
    
    # for guesstimates (only)
    methods_used <- ifelse(is.na(methods_used), method_pref, methods_used)
    
    conv_fact   <- ifelse(method_pref == "A", 1 / conv_fact, conv_fact)
    conv_fact   <- ifelse(methods_used == method_pref, 1, conv_fact)
    
    data[ , "count_conv"]   <- 
      data[ , "count_conv"]    * conv_fact
    data[ , "lower_ci_conv"] <- 
      data[ , "lower_ci_conv"] * conv_fact
    data[ , "upper_ci_conv"] <- 
      data[ , "upper_ci_conv"] * conv_fact
    
    data[ , "field_method_conv"] <- method_pref
  }
  
  data
}



#' Special cases: zero counts
#'
#' For a count series, identify zero counts.
#' 
#' If there are only zero counts, returns an error (`na_rm = FALSE`) or delete 
#' series (`na_rm = TRUE`). 
#' 
#' If there are zero and non-zero counts, replaces 0 counts by the smaller 
#' non-zero count (and replaces `lower_ci_conv` and `upper_ci_conv` by 
#' the corresponding values).
#'
#' @param data a `data.frame`
#' 
#' @param na_rm a `logical`. If `TRUE` delete series with all 0 counts. 
#'   Otherwise, return an error.
#'
#' @return A `data.frame` (same as `data`).
#' 
#' @noRd

zero_counts <- function(data, na_rm) {
  
  pos <- which(data[ , "count_conv"] == 0)
  
  if (length(pos)) {
    
    if (length(pos) == nrow(data)) {
      
      if (!na_rm) {
        
        stop("Some series have only zero counts. Check your data or ", 
             "'use na_rm = TRUE'.")
        
      } else {
        
        return(data[-pos, ])
      }
      
    } else {
      
      non_zero_counts  <- data[-pos, ]
      which_min_counts <- which.min(non_zero_counts[ , "count_conv"])[1]
      
      data[pos, "count_conv"]    <- non_zero_counts[which_min_counts, 
                                                    "count_conv"]
      data[pos, "lower_ci_conv"] <- non_zero_counts[which_min_counts, 
                                                    "lower_ci_conv"]
      data[pos, "upper_ci_conv"] <- non_zero_counts[which_min_counts, 
                                                    "upper_ci_conv"]
    }
  }
  
  data
}

Try the popbayes package in your browser

Any scripts or data that you put into this service are public.

popbayes documentation built on July 9, 2023, 7:27 p.m.