Nothing
#' Format count series
#'
#' @description
#' This function provides an easy way to get count series ready to be analyzed
#' by the package `popbayes`. It must be used prior to all other functions.
#'
#' This function formats the count series (passed through the argument
#' `data`) by selecting and renaming columns, checking columns format and
#' content, and removing missing data (if `na_rm = TRUE`). It converts the
#' original data frame into a list of count series that will be analyzed later
#' by the function [fit_trend()] to estimate population trends.
#'
#' To be usable for the estimation of population trends, counts must be
#' accompanied by information on precision. The population trend model requires
#' a 95% confident interval (CI).
#' If estimates are total counts or guesstimates, this function will construct
#' boundaries of the 95% CI by applying the rules set out in
#' \url{https://frbcesab.github.io/popbayes/articles/popbayes.html}.
#' If counts were estimated by a sampling method the user needs to specify a
#' measure of precision. Precision is preferably provided in the form of a 95%
#' CI by means of two fields: `lower_ci` and `upper_ci`. It may also be given
#' in the form of a standard deviation (`sd`), a variance (`var`), or a
#' coefficient of variation (`cv`). If the fields `lower_ci` and `upper_ci` are
#' both absent (or `NA`), fields `sd`, `var`, and `cv` are examined in this
#' order. When one is found valid (no missing value), a 95% CI is derived
#' assuming a normal distribution.
#' The field `stat_method` must be present in `data` to indicate
#' if counts are **total counts** (`'T'`), **sampling** (`'S'`), or
#' **guesstimate** (`'X'`).
#'
#' If a series mixes aerial and ground counts, a field `field_method` must
#' also be present and must contain either `'A'` (aerial counts), or `'G'`
#' (ground counts). As all counts must eventually refer to the same field
#' method for a correct estimation of trend, a conversion will be performed to
#' homogenize counts. This conversion is based on a **preferred field method**
#' and a **conversion factor** both specific to a species/category.
#' The preferred field method specifies the conversion direction. The
#' conversion factor is the multiplicative factor that must be applied to an
#' aerial count to get an equivalent ground count (note that if the preferred
#' field method is `'A'`, ground counts will be divided by the conversion
#' factor to get the equivalent aerial count).
#'
#' The argument `rmax` represents the maximum change in log population size
#' between two dates (i.e. the relative rate of increase). It will be used
#' by [fit_trend()] but must be provided in this function.
#'
#' These three parameters, named `pref_field_method`, `conversion_A2G`, and
#' `rmax` can be present in `data` or in a second `data.frame`
#' (passed through the argument `info`).
#' Alternatively, the package `popbayes` provides their values for some
#' African large mammals.
#'
#' **Note:** If the field `field_method` is absent in `data`, counts are
#' assumed to be obtained with one field method.
#'
#'
#'
#' @param data a `data.frame` with at least five columns: `location`,
#' `species`, `date`, `count`, and `stat_method`.
#'
#' The `stat_method` field indicates the method used to estimate counts. It
#' can contain: `T` (total counts), `X` (guesstimate), and/or `S` (sampling).
#'
#' If individual counts were estimated by **sampling**, additional column(s)
#' providing a measure of precision is also required (e.g. `lower_ci` and
#' `upper_ci`, or `sd`, `cv`, `var`). Precision metrics can be different
#' between counts. For instance, some sampling counts can have a `sd` value
#' and others `lower_ci` and `upper_ci`. In that case three columns are
#' required (`lower_ci`, `upper_ci`, and `sd`). See above section
#' **Description** for further information on the computation of the 95%
#' confident interval of estimates.
#'
#' If the individuals were counted by different methods, an additional field
#' `field_method` is also required. It can contain: `G` (ground counts)
#' and/or `A` (aerial counts). See above section **Description** for further
#' information on the counts conversion.
#'
#' Others fields can be present either in `data` or `info` (see below).
#'
#' @param info (optional) a `data.frame` with species in rows and the following
#' columns: `species` (species name), `pref_field_method`,
#' `conversion_A2G`, and `rmax`. See above section **Description** for
#' further information on these fields.
#' Default is `NULL` (i.e. these information must be present in `data`
#' if not available in `popbayes`).
#'
#' @param location a `character` string. The column name in `data` of the
#' site. This field is used to distinguish count series from different sites
#' (if required) and to create an unique series name.
#' Default is `'location'`.
#'
#' @param species a `character` string. The column name in `data` (and
#' in `info` if provided) of the species. This field is used to distinguish
#' count series for different species (if required) and to create an unique
#' series name.
#' Default is `'species'`.
#'
#' @param date a `character` string. The column name in `data` of the date.
#' This column `date` must be in a numerical form with possibly a decimal
#' part.
#' Default is `'date'`.
#'
#' @param count a `character` string. The column name in `data` of the
#' number of individuals. This column must be numerical.
#' Default is `'count'`.
#'
#' @param stat_method a `character` string. The column name in `data` of
#' the method used to estimate individuals counts. It can contain `'T'`
#' (total counts), `'X'` (guesstimate), and/or `'S'` (sampling). If some
#' counts are coded as `'S'`, precision column(s) must also be provided (see
#' below).
#' Default is `'stat_method'`.
#'
#' @param lower_ci (optional) a `character` string. The column name in `data`
#' of the lower boundary of the 95% CI of the estimate (i.e. `count`). If
#' provided, the upper boundary of the 95% CI (argument `upper_ci`) must be
#' also provided. This argument is only required if some counts have been
#' estimated by a sampling method. But user may prefer use other precision
#' measures, e.g. standard deviation (argument `sd`), variance (argument
#' `var`), or coefficient of variation (argument `cv`).
#' Default is `'lower_ci'`.
#'
#' @param upper_ci (optional) a `character` string. The column name in `data`
#' of the upper boundary of the 95% CI of the estimate (i.e. `count`). If
#' provided, the lower boundary of the 95% CI (argument `lower_ci`) must be
#' also provided.
#' Default is `'upper_ci'`.
#'
#' @param sd (optional) a `character` string. The column name in `data` of
#' the standard deviation of the estimate.
#' Default is `NULL`.
#'
#' @param var (optional) a `character` string. The column name in `data` of
#' the variance of the estimate.
#' Default is `NULL`.
#'
#' @param cv (optional) a `character` string. The column name in `data` of
#' the coefficient of variation of the estimate.
#' Default is `NULL`.
#'
#' @param field_method (optional) a `character` string. The column name in
#' `data` of the field method used to count individuals. Counts can be ground
#' counts (coded as `'G'`) or aerial counts (coded as `'A'`). This argument
#' is optional if individuals have been counted by the same method. See above
#' section **Description** for further information on the count conversion.
#' Default is `NULL`.
#'
#' @param pref_field_method (optional) a `character` string. The column name
#' in `data` of the preferred field method of the species. This argument is
#' only required is `field_method` is not `NULL` (i.e. individuals have been
#' counted by different methods). Alternatively, this value can be passed in
#' `info` (or internally retrieved if the species is listed in the package).
#' See above section **Description** for further information on the count
#' conversion.
#' Default is `NULL`.
#'
#' @param conversion_A2G (optional) a `character` string. The column name
#' in `data` of the count conversion factor of the species. This argument is
#' only required if `field_method` is not `NULL` (i.e. individuals have been
#' counted by different methods). Alternatively this value can be passed in
#' `info` (or internally retrieved if the species is listed in the package).
#' See above section **Description** for further information on the count
#' conversion.
#' Default is `NULL`.
#'
#' @param rmax (optional) a `character` string. The column name in `data` of
#' the species demographic potential (i.e. the relative rate of increase of
#' the population). This is the change in log population size between two
#' dates and will be used later by [fit_trend()].
#' Default is `NULL`.
#'
#' @param path a `character` string. The directory to save formatted data.
#' This directory must exist and can be an absolute or a relative path.
#' Default is the current working directory.
#'
#' @param na_rm a `logical.` If `TRUE`, counts with `NA` values will be
#' removed.
#' Default is `FALSE` (returns an error to inform user if `NA` are detected).
#'
#'
#'
#' @return An n-elements `list` (where `n` is the number of count series). The
#' name of each element of this list is a combination of location and
#' species. Each element of the list is a `list` with the following content:
#' \itemize{
#' \item \code{location} a `character` string. The name of the series site.
#' \item \code{species} a `character` string. The name of the series species.
#' \item \code{date} a `numerical` vector. The sequence of dates of the
#' series.
#' \item \code{n_dates} an `integer.` The number of unique dates.
#' \item \code{stat_methods} a `character` vector. The different stat methods
#' of the series.
#' \item \code{field_methods} (optional) a `character` vector. The different
#' field methods of the series.
#' \item \code{pref_field_method} (optional) a `character` string. The
#' preferred field method of the species (`'A'` or `'G'`).
#' \item \code{conversion_A2G} (optional) a `numeric`. The conversion factor
#' of the species used to convert counts to its preferred field method.
#' \item \code{rmax} a `numeric`. The maximum population growth rate of the
#' species.
#' \item \code{data_original} a `data.frame`. Original data of the series
#' with renamed columns. Some rows may have been deleted
#' (if `na_rm = TRUE`).
#' \item \code{data_converted} a `data.frame`. Data containing computed
#' boundaries of the 95% CI (`lower_ci_conv` and `upper_ci_conv`). If
#' counts have been obtained by different field methods, contains also
#' converted counts (`count_conv`) based on the preferred field method and
#' conversion factor of the species. This `data.frame` will be used by the
#' function [fit_trend()] to fit population models.
#' }
#'
#' **Note:** Some original series can be discarded if one of these two
#' conditions is met: 1) the series contains only zero counts, and 2) the
#' series contains only a few dates (< 4 dates).
#'
#' @export
#'
#' @examples
#' ## Load Garamba raw dataset ----
#' file_path <- system.file("extdata", "garamba_survey.csv",
#' package = "popbayes")
#'
#' garamba <- read.csv(file = file_path)
#'
#' ## Create temporary folder ----
#' temp_path <- tempdir()
#'
#' ## Format dataset ----
#' garamba_formatted <- popbayes::format_data(
#' data = garamba,
#' path = temp_path,
#' field_method = "field_method",
#' pref_field_method = "pref_field_method",
#' conversion_A2G = "conversion_A2G",
#' rmax = "rmax")
#'
#' ## Number of count series ----
#' length(garamba_formatted)
#'
#' ## Retrieve count series names ----
#' popbayes::list_series(path = temp_path)
#'
#' ## Print content of the first count series ----
#' names(garamba_formatted[[1]])
#'
#' ## Print original data ----
#' garamba_formatted[[1]]$"data_original"
#'
#' ## Print converted data ----
#' garamba_formatted[[1]]$"data_converted"
format_data <- function(data, info = NULL, date = "date", count = "count",
location = "location", species = "species",
stat_method = "stat_method", lower_ci = "lower_ci",
upper_ci = "upper_ci", sd = NULL, var = NULL,
cv = NULL, field_method = NULL,
pref_field_method = NULL,
conversion_A2G = NULL, rmax = NULL,
path = ".", na_rm = FALSE) {
# CHECKS ----
## Check argument 'path' ----
if (!is.character(path) || length(path) != 1) {
stop("Argument 'path' must be a character string.")
}
if (!dir.exists(path)) {
stop("The directory '", path, "' does not exist.")
}
## Check argument 'data' ----
if (missing(data)) {
stop("Argument 'data' is required.")
}
if (!is.data.frame(data)) {
stop("Argument 'data' must be a data.frame.")
}
## Coerce to data.frame (tibble) ----
data <- as.data.frame(data)
## Check argument 'location' ----
if (!is.character(location) || length(location) != 1) {
stop("Argument 'location' must be a column name (character string).")
}
if (!(location %in% colnames(data))) {
stop("The column '", location, "' (argument location) is absent from ",
"'data'. Please check the spelling.")
}
data[ , location] <- as.character(data[ , location])
if (any(is.na(data[ , location]))) {
stop("The column '", location, "' cannot contain NA.")
}
## Check argument 'species' ----
if (!is.character(species) || length(species) != 1) {
stop("Argument 'species' must be a column name (character string).")
}
if (!(species %in% colnames(data))) {
stop("The column '", species, "' (argument species) is absent from ",
"'data'. Please check the spelling.")
}
data[ , species] <- as.character(data[ , species])
if (any(is.na(data[ , species]))) {
stop("The column '", species, "' cannot contain NA.")
}
## Check argument 'date' ----
if (!is.character(date) || length(date) != 1) {
stop("Argument 'date' must be a column name (character string).")
}
if (!(date %in% colnames(data))) {
stop("The column '", date, "' (argument date) is absent from 'data'. ",
"Please check the spelling.")
}
if (!is.numeric(data[ , date])) {
stop("The column '", date, "' must be numeric.")
}
if (any(is.na(data[ , date]))) {
stop("The column '", date, "' cannot contain NA.")
}
## Check argument 'count' ----
if (!is.character(count) || length(count) != 1) {
stop("Argument 'count' must be a column name (character string).")
}
if (!(count %in% colnames(data))) {
stop("The column '", count, "' (argument count) is absent from 'data'. ",
"Please check the spelling.")
}
if (!is.numeric(data[ , count])) {
stop("The column '", count, "' must be numeric.")
}
if (length(which(data[ , count] < 0))) {
stop("The column '", count, "' must be positive (or zero).")
}
## Check argument 'na_rm' ----
if (!is.logical(na_rm) || length(na_rm) != 1) {
stop("Argument 'na_rm' must be TRUE or FALSE.")
}
## Check argument 'stat_method' ----
if (!is.character(stat_method) || length(stat_method) != 1) {
stop("Argument 'stat_method' must be a column name (character string).")
}
if (!(stat_method %in% colnames(data))) {
stop("The column '", stat_method, "' (argument stat_method) is absent ",
"from 'data'. Please check the spelling.")
}
valid_stat_methods <- c("T", "X", "S")
valid_stat_methods_msg <- paste0(valid_stat_methods, collapse = "' or '")
valid_stat_methods_msg <- paste0("'", valid_stat_methods_msg, "'")
data[ , stat_method] <- as.character(data[ , stat_method])
if (any(is.na(data[ , stat_method]))) {
stop("The column '", stat_method, "' cannot contain NA.")
}
if (any(!(data[ , stat_method] %in% valid_stat_methods))) {
stop("Invalid value(s) for 'stat_method' in 'data'. ",
"Allowed values are: ", valid_stat_methods_msg, ".")
}
## Check argument 'lower_ci' ----
if (!is.null(lower_ci)) {
if (!is.character(lower_ci) || length(lower_ci) != 1) {
stop("Argument 'lower_ci' must be a column name (character string).")
}
if (!(lower_ci %in% colnames(data))) {
stop("The column '", lower_ci, "' (argument lower_ci) is absent ",
"from 'data'. Please check the spelling.")
}
if (!is.numeric(data[ , lower_ci])) {
stop("The column '", lower_ci, "' must be numeric.")
}
if (length(which(data[ , lower_ci] < 0))) {
stop("The column '", lower_ci, "' must be positive (or zero).")
}
}
## Check argument 'upper_ci' ----
if (!is.null(upper_ci)) {
if (!is.character(upper_ci) || length(upper_ci) != 1) {
stop("Argument 'upper_ci' must be a column name (character string).")
}
if (!(upper_ci %in% colnames(data))) {
stop("The column '", upper_ci, "' (argument upper_ci) is absent ",
"from 'data'. Please check the spelling.")
}
if (!is.numeric(data[ , upper_ci])) {
stop("The column '", upper_ci, "' must be numeric.")
}
if (length(which(data[ , upper_ci] < 0))) {
stop("The column '", upper_ci, "' must be positive (or zero).")
}
}
## Check argument 'sd' ----
if (!is.null(sd)) {
if (!is.character(sd) || length(sd) != 1) {
stop("Argument 'sd' must be a column name (character string).")
}
if (!(sd %in% colnames(data))) {
stop("The column '", sd, "' (argument sd) is absent from 'data'. ",
"Please check the spelling.")
}
if (!is.numeric(data[ , sd])) {
stop("The column '", sd, "' must be numeric.")
}
if (length(which(data[ , sd] <= 0))) {
stop("The column '", sd, "' must be strictly positive.")
}
}
## Check argument 'var' ----
if (!is.null(var)) {
if (!is.character(var) || length(var) != 1) {
stop("Argument 'var' must be a column name (character string).")
}
if (!(var %in% colnames(data))) {
stop("The column '", var, "' (argument var) is absent from 'data'. ",
"Please check the spelling.")
}
if (!is.numeric(data[ , var])) {
stop("The column '", var, "' must be numeric.")
}
if (length(which(data[ , var] <= 0))) {
stop("The column '", var, "' must be strictly positive.")
}
}
## Check argument 'cv' ----
if (!is.null(cv)) {
if (!is.character(cv) || length(cv) != 1) {
stop("Argument 'cv' must be a column name (character string).")
}
if (!(cv %in% colnames(data))) {
stop("The column '", cv, "' (argument cv) is absent from 'data'. ",
"Please check the spelling.")
}
if (!is.numeric(data[ , cv])) {
stop("The column '", cv, "' must be numeric.")
}
if (length(which(data[ , cv] <= 0))) {
stop("The column '", cv, "' must be strictly positive.")
}
}
## Check argument 'field_method' ----
if (!is.null(field_method)) {
if (!is.character(field_method) || length(field_method) != 1) {
stop("Argument 'field_method' must be a column name (character of ",
"length 1).")
}
if (!(field_method %in% colnames(data))) {
stop("The column '", field_method, "' (argument field_method) is ",
"absent from 'data'. Please check the spelling.")
}
valid_field_methods <- c("G", "A")
valid_field_methods_msg <- paste0(valid_field_methods, collapse = "' or '")
valid_field_methods_msg <- paste0("'", valid_field_methods_msg, "'")
data[ , field_method] <- as.character(data[ , field_method])
data_not_x <- data[data[ , stat_method] != "X", ]
if (nrow(data_not_x)) {
if (any(is.na(data_not_x[ , field_method]))) {
stop("The column '", field_method, "' cannot contain NA ",
"(except for count with stat_method = 'X').")
}
if (any(!(data_not_x[ , field_method] %in% valid_field_methods))) {
stop("Invalid value(s) for 'field_method' in 'data'. ",
"Allowed values are: ", valid_field_methods_msg, ".")
}
}
}
## Check argument 'pref_field_method' ----
if (!is.null(pref_field_method)) {
if (!is.character(pref_field_method) || length(pref_field_method) != 1) {
stop("Argument 'pref_field_method' must be a column name (character of ",
"length 1).")
}
if (!(pref_field_method %in% colnames(data))) {
stop("The column '", pref_field_method, "' (argument pref_field_method) ",
"is absent from 'data'. Please check the spelling.")
}
valid_field_methods <- c("G", "A")
valid_field_methods_msg <- paste0(valid_field_methods, collapse = "' or '")
valid_field_methods_msg <- paste0("'", valid_field_methods_msg, "'")
data[ , pref_field_method] <- as.character(data[ , pref_field_method])
data_not_na <- data[!is.na(data[ , pref_field_method]), ]
if (nrow(data_not_na)) {
if (any(!(data_not_na[ , pref_field_method] %in% valid_field_methods))) {
stop("Invalid value(s) for 'field_method' in 'data'. ",
"Allowed values are: ", valid_field_methods_msg, ".")
}
}
}
## Check argument 'conversion_A2G' ----
if (!is.null(conversion_A2G)) {
if (!is.character(conversion_A2G) || length(conversion_A2G) != 1) {
stop("Argument 'conversion_A2G' must be a column name (character of ",
"length 1).")
}
if (!(conversion_A2G %in% colnames(data))) {
stop("The column '", conversion_A2G, "' (argument conversion_A2G) ",
"is absent from 'data'. Please check the spelling.")
}
if (!is.numeric(data[ , conversion_A2G])) {
stop("The column '", conversion_A2G, "' must be a numeric.")
}
if (length(which(data[ , conversion_A2G] <= 0))) {
stop("The column '", conversion_A2G, "' must be strictly positive.")
}
}
## Check argument 'rmax' ----
if (!is.null(rmax)) {
if (!is.character(rmax) || length(rmax) != 1) {
stop("Argument 'rmax' must be a column name (character of ",
"length 1).")
}
if (!(rmax %in% colnames(data))) {
stop("The column '", rmax, "' (argument rmax) ",
"is absent from 'data'. Please check the spelling.")
}
if (!is.numeric(data[ , rmax])) {
stop("The column '", rmax, "' must be a numeric.")
}
if (length(which(data[ , rmax] <= 0))) {
stop("The column '", rmax, "' must be strictly positive.")
}
}
## Check argument 'info' ----
if (!is.null(info)) {
if (!is.data.frame(info)) {
stop("Argument 'info' must be a data frame.")
}
valid_info_colnames <- c("species", "pref_field_method", "conversion_A2G",
"rmax")
valid_info_colnames_msg <- paste0(valid_info_colnames, collapse = "' and '")
valid_info_colnames_msg <- paste0("'", valid_info_colnames_msg, "'")
if (any(!(valid_info_colnames %in% colnames(info)))) {
stop("Invalid columns in 'info'. ",
"Required variables are: ", valid_info_colnames_msg, ".")
}
if (any(is.na(info[ , valid_info_colnames]))) {
stop("The dataset 'info' cannot contain NA.")
}
if (length(which(duplicated(info$"species"))) > 0) {
stop("The 'info' data.frame cannot contain duplicated species.")
}
}
## Check precision information ----
if ("S" %in% data[ , stat_method]) {
if (is.null(lower_ci) && is.null(upper_ci) && is.null(sd) && is.null(cv) &&
is.null(var)) {
stop("No valid measure of precision is available for sampling counts. ",
"Add 'lower_ci' and 'upper_ci' and/or 'sd', 'var', 'cv' ",
"information.")
}
if (!is.null(lower_ci) && is.null(upper_ci)) {
stop("You must provide both lower and upper CI.")
}
if (is.null(lower_ci) && !is.null(upper_ci)) {
stop("You must provide both lower and upper CI.")
}
}
# NA COUNTS ----
## Handle NA in counts ----
data <- is_na_counts(data, count, na_rm)
if (nrow(data) == 0) {
stop("All counts are NA. Please check your data.")
}
# FORMAT DATA ----
## Detect precision(s) columns ----
precision_cols <- NULL
if ("S" %in% data[ , stat_method]) {
if (!is.null(lower_ci)) {
precision_cols <- c(precision_cols, lower_ci)
names(precision_cols)[length(precision_cols)] <- "lower_ci"
}
if (!is.null(upper_ci)) {
precision_cols <- c(precision_cols, upper_ci)
names(precision_cols)[length(precision_cols)] <- "upper_ci"
}
if (!is.null(sd)) {
precision_cols <- c(precision_cols, sd)
names(precision_cols)[length(precision_cols)] <- "sd"
}
if (!is.null(var)) {
precision_cols <- c(precision_cols, var)
names(precision_cols)[length(precision_cols)] <- "var"
}
if (!is.null(cv)) {
precision_cols <- c(precision_cols, cv)
names(precision_cols)[length(precision_cols)] <- "cv"
}
}
## Rename columns ----
data_renamed <- data[ , c(location, species, date, stat_method)]
colnames(data_renamed) <- c("location", "species", "date", "stat_method")
if (!is.null(field_method)) {
data_renamed <- data.frame(data_renamed,
"field_method" = data[ , field_method])
}
if (!is.null(pref_field_method)) {
data_renamed <- data.frame(data_renamed,
"pref_field_method" = data[ , pref_field_method])
}
if (!is.null(conversion_A2G)) {
data_renamed <- data.frame(data_renamed,
"conversion_A2G" = data[ , conversion_A2G])
}
if (!is.null(rmax)) {
data_renamed <- data.frame(data_renamed,
"rmax" = data[ , rmax])
}
data_renamed <- data.frame(data_renamed,
"count_orig" = data[ , count])
if (!is.null(precision_cols)) {
precision_data <- data[ , precision_cols]
precision_cols <- paste0(names(precision_cols), "_orig")
colnames(precision_data) <- precision_cols
data_renamed <- data.frame(data_renamed, precision_data)
}
# CREATE SERIES ----
## Detect series ----
series_infos <- get_series(data_renamed, quiet = TRUE)
## Split data by series ----
count_series <- list()
for (i in seq_len(nrow(series_infos))) {
id <- series_infos[i, "id"]
sel_rows <- which(data_renamed$"location" == series_infos[i, "location"] &
data_renamed$"species" == series_infos[i, "species"])
series <- data_renamed[sel_rows, ]
rownames(series) <- NULL
count_series[[id]] <- series
}
# SPECIES INFO ----
for (i in seq_along(count_series)) {
if (!is.null(field_method)) {
## Retrieve 'pref_field_method' ----
if (!is.null(pref_field_method)) {
pref_field_method_data <- unique(count_series[[i]]$"pref_field_method")
pref_field_method_data <- pref_field_method_data[
!is.na(pref_field_method_data)]
if (length(pref_field_method_data) > 1) {
stop("Multiple values for 'pref_field_method' detected in ",
names(count_series)[i], ".")
}
if (length(pref_field_method_data) == 0) {
if (!is.null(info)) {
if (unique(count_series[[i]]$"species") %in% info$"species") {
pos <- info$"species" %in% count_series[[i]]$"species"[1]
pref_field_method_data <- info[pos, "pref_field_method"]
} else {
if (unique(count_series[[i]]$"species") %in%
species_info$"species") {
pos <- species_info$"species" %in%
count_series[[i]]$"species"[1]
pref_field_method_data <- species_info[pos, "pref_field_method"]
} else {
stop("Unable to retrieve 'pref_field_method' for the series ",
names(count_series)[i])
}
}
} else {
if (unique(count_series[[i]]$"species") %in%
species_info$"species") {
pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
pref_field_method_data <- species_info[pos, "pref_field_method"]
} else {
stop("Unable to retrieve 'pref_field_method' for the series ",
names(count_series)[i])
}
}
}
} else {
if (!is.null(info)) {
if (unique(count_series[[i]]$"species") %in% info$"species") {
pos <- info$"species" %in% count_series[[i]]$"species"[1]
pref_field_method_data <- info[pos, "pref_field_method"]
} else {
if (unique(count_series[[i]]$"species") %in%
species_info$"species") {
pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
pref_field_method_data <- species_info[pos, "pref_field_method"]
} else {
stop("Unable to retrieve 'pref_field_method' for the series ",
names(count_series)[i])
}
}
} else {
if (unique(count_series[[i]]$"species") %in%
species_info$"species") {
pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
pref_field_method_data <- species_info[pos, "pref_field_method"]
} else {
stop("Unable to retrieve 'pref_field_method' for the series ",
names(count_series)[i])
}
}
}
count_series[[i]]$"pref_field_method" <- pref_field_method_data
## Retrieve 'conversion_A2G' ----
if (!is.null(conversion_A2G)) {
conversion_A2G_data <- unique(count_series[[i]]$"conversion_A2G")
conversion_A2G_data <- conversion_A2G_data[
!is.na(conversion_A2G_data)]
if (length(conversion_A2G_data) > 1) {
stop("Multiple values for 'conversion_A2G' detected in ",
names(count_series)[i], ".")
}
if (length(conversion_A2G_data) == 0) {
if (!is.null(info)) {
if (unique(count_series[[i]]$"species") %in% info$"species") {
pos <- info$"species" %in% count_series[[i]]$"species"[1]
conversion_A2G_data <- info[pos, "conversion_A2G"]
} else {
if (unique(count_series[[i]]$"species") %in%
species_info$"species") {
pos <- species_info$"species" %in%
count_series[[i]]$"species"[1]
conversion_A2G_data <- species_info[pos, "conversion_A2G"]
} else {
stop("Unable to retrieve 'conversion_A2G' for the series ",
names(count_series)[i])
}
}
} else {
if (unique(count_series[[i]]$"species") %in%
species_info$"species") {
pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
conversion_A2G_data <- species_info[pos, "conversion_A2G"]
} else {
stop("Unable to retrieve 'conversion_A2G' for the series ",
names(count_series)[i])
}
}
}
} else {
if (!is.null(info)) {
if (unique(count_series[[i]]$"species") %in% info$"species") {
pos <- info$"species" %in% count_series[[i]]$"species"[1]
conversion_A2G_data <- info[pos, "conversion_A2G"]
} else {
if (unique(count_series[[i]]$"species") %in%
species_info$"species") {
pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
conversion_A2G_data <- species_info[pos, "conversion_A2G"]
} else {
stop("Unable to retrieve 'conversion_A2G' for the series ",
names(count_series)[i])
}
}
} else {
if (unique(count_series[[i]]$"species") %in%
species_info$"species") {
pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
conversion_A2G_data <- species_info[pos, "conversion_A2G"]
} else {
stop("Unable to retrieve 'conversion_A2G' for the series ",
names(count_series)[i])
}
}
}
count_series[[i]]$"conversion_A2G" <- conversion_A2G_data
}
## Retrieve 'rmax' ----
if (!is.null(rmax)) {
rmax_data <- unique(count_series[[i]]$"rmax")
rmax_data <- rmax_data[!is.na(rmax_data)]
if (length(rmax_data) > 1) {
stop("Multiple values for 'rmax' detected in ",
names(count_series)[i], ".")
}
if (length(rmax_data) == 0) {
if (!is.null(info)) {
if (unique(count_series[[i]]$"species") %in% info$"species") {
pos <- info$"species" %in% count_series[[i]]$"species"[1]
rmax_data <- info[pos, "rmax"]
} else {
if (unique(count_series[[i]]$"species") %in%
species_info$"species") {
pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
rmax_data <- species_info[pos, "rmax"]
} else {
stop("Unable to retrieve 'rmax' for the series ",
names(count_series)[i])
}
}
} else {
if (unique(count_series[[i]]$"species") %in%
species_info$"species") {
pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
rmax_data <- species_info[pos, "rmax"]
} else {
stop("Unable to retrieve 'rmax' for the series ",
names(count_series)[i])
}
}
}
} else {
if (!is.null(info)) {
if (unique(count_series[[i]]$"species") %in% info$"species") {
pos <- info$"species" %in% count_series[[i]]$"species"[1]
rmax_data <- info[pos, "rmax"]
} else {
if (unique(count_series[[i]]$"species") %in%
species_info$"species") {
pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
rmax_data <- species_info[pos, "rmax"]
} else {
stop("Unable to retrieve 'rmax' for the series ",
names(count_series)[i])
}
}
} else {
if (unique(count_series[[i]]$"species") %in%
species_info$"species") {
pos <- species_info$"species" %in% count_series[[i]]$"species"[1]
rmax_data <- species_info[pos, "rmax"]
} else {
stop("Unable to retrieve 'rmax' for the series ",
names(count_series)[i])
}
}
}
count_series[[i]]$"rmax" <- rmax_data
}
# CHECK PRECISION MEASURES ----
for (i in seq_along(count_series)) {
count_series[[i]] <- is_na_precision(count_series[[i]], precision_cols,
na_rm)
}
## Remove series ----
n_rows <- unlist(lapply(count_series, function(x) nrow(x)))
n_rows <- n_rows[n_rows == 0]
if (length(n_rows)) {
series_to_del <- names(n_rows)
count_series <- count_series[-which(names(count_series) %in% series_to_del)]
if (length(n_rows) < length(count_series)) {
usethis::ui_oops(paste0("Removing the following series without valid ",
"precision measures: ",
usethis::ui_value(paste0(series_to_del,
collapse = ", "))))
}
}
n_rows <- unlist(lapply(count_series, function(x) nrow(x)))
n_rows <- n_rows[n_rows < 4]
if (length(n_rows)) {
series_to_del <- names(n_rows)
if (!na_rm) {
stop("The following count series have not enough data (< 4):\n",
usethis::ui_value(paste0(series_to_del,
collapse = ", ")),
"\nRemove these count series or use 'na_rm = TRUE'.")
} else {
count_series <- count_series[-which(names(count_series) %in%
series_to_del)]
}
if (length(n_rows) < length(count_series)) {
usethis::ui_oops(paste0("Removing the following series without enough ",
"data (< 4) ",
usethis::ui_value(paste0(series_to_del,
collapse = ", "))))
}
}
if (length(count_series) == 0) {
stop("All your series have been removed. Please check your data.")
}
# COMPUTE CI BOUNDARIES ----
for (i in seq_along(count_series)) {
count_series[[i]] <- compute_ci(count_series[[i]], precision_cols)
}
# CONVERT COUNTS ----
for (i in seq_along(count_series)) {
count_series[[i]] <- convert_counts(count_series[[i]], field_method)
}
# ZERO COUNTS ----
for (i in seq_along(count_series)) {
count_series[[i]] <- zero_counts(count_series[[i]], na_rm)
}
n_rows <- unlist(lapply(count_series, function(x) nrow(x)))
n_rows <- n_rows[n_rows == 0]
if (length(n_rows)) {
series_to_del <- names(n_rows)
count_series <- count_series[-which(names(count_series) %in% series_to_del)]
if (length(n_rows) < length(count_series)) {
usethis::ui_oops(paste0("Removing the following series without valid ",
"precision measures: ",
usethis::ui_value(paste0(series_to_del,
collapse = ", "))))
}
}
if (length(count_series) == 0) {
stop("All your series have been removed. Please check your data.")
}
# FINAL CHECKS ----
for (i in seq_along(count_series)) {
if (any(count_series[[i]][ , "lower_ci_conv"] ==
count_series[[i]][ , "upper_ci_conv"])) {
stop(paste0("Lower and upper CI bounds cannot be strictly equal for ",
"the series.",
usethis::ui_value(names(count_series)[i])))
}
}
# CREATE FINAL LIST ----
data_series <- list()
for (i in seq_len(nrow(series_infos))) {
id <- names(count_series)[i]
data_sub <- count_series[[i]][order(count_series[[i]]$"date",
decreasing = FALSE), ]
rownames(data_sub) <- NULL
if (!is.null(field_method)) {
pref_field_method <- unique(data_sub$"pref_field_method")
conversion_A2G <- unique(data_sub$"conversion_A2G")
field_methods <- sort(unique(data_sub[ , "field_method"]))
field_methods <- field_methods[!is.na(field_methods)]
} else {
pref_field_method <- NULL
conversion_A2G <- NULL
field_methods <- NULL
}
rmax <- unique(data_sub$"rmax")
data_series[[id]] <- list(
"location" = unique(data_sub$"location"),
"species" = unique(data_sub$"species"),
"dates" = data_sub[ , "date"],
"n_dates" = length(unique(data_sub[ , "date"])),
"stat_methods" = sort(unique(data_sub[ , "stat_method"])),
"field_methods" = field_methods,
"pref_field_method" = pref_field_method,
"conversion_A2G" = conversion_A2G,
"rmax" = rmax,
"data_original" = data_sub[ , -grep("_conv", colnames(data_sub))],
"data_converted" = data_sub[ , -grep("_orig", colnames(data_sub))])
## Export sub-list ----
species_path <- file.path(path, id)
dir.create(species_path, showWarnings = FALSE)
formatted_data <- data_series[id]
save(formatted_data, file = file.path(species_path,
paste0(id, "_data.RData")))
}
usethis::ui_done(paste0("Detecting {usethis::ui_value(length(",
"data_series))} count series"))
data_series
}
#' Remove rows or return error if NA counts detected
#'
#' @param data a `data.frame`
#' @param col a `character` string (column to inspect)
#' @param na_rm a `logical`. If `TRUE`, deletes rows. Otherwise, returns an
#' error.
#'
#' @return A `data.frame` (same as `data` with possibly some rows removed).
#'
#' @noRd
is_na_counts <- function(data, col, na_rm) {
if (any(is.na(data[ , col]))) {
if (!na_rm) {
stop("The column '", col, "' cannot contain NA. If you want to ",
"remove missing counts, please use 'na_rm = TRUE'.", call. = FALSE)
} else {
pos <- which(is.na(data[ , col]))
usethis::ui_info(paste0("Removing {usethis::ui_value(length(pos))} ",
"rows with NA values in '", col, "' field."))
data <- data[-pos, ]
}
}
data
}
#' Check precision data for sampling counts
#'
#' - Check for missing precision values: if neither ci, sd, var, cv are found,
#' returns an error (`na_rm = FALSE`) or deletes rows (`na_rm = TRUE`).
#' - Check for CI boundaries: lower and upper boundaries are required. If not,
#' returns an error.
#' - Check CI boundaries values: if lower_ci > count or upper_ci < count,
#' returns an error.
#'
#' @param data a `data.frame`
#' @param precision_cols a `character` vector (column names of precision
#' information, i.e. `'lower_ci_orig'`, `'sd_orig'`, etc.)
#' @param na_rm a `logical`. If `TRUE` delete rows. Otherwise return an error.
#'
#' @return A data frame (same as `data`).
#'
#' @noRd
is_na_precision <- function(data, precision_cols, na_rm) {
## Check for missing precision values ----
sampling_rows <- which(data[ , "stat_method"] == "S")
if (length(sampling_rows)) {
is_na <- apply(data[sampling_rows, precision_cols], 1, function(x) {
x <- sum(ifelse(is.na(x), 0, 1))
ifelse(x == 0, TRUE, FALSE)
})
## Remove or stop if missing precision values for S ----
if (sum(is_na)) {
if (!na_rm) {
stop("Precision column(s) cannot all be NA for sampling counts. If ",
"you want to remove counts missing precision information, please ",
"use 'na_rm = TRUE'.")
} else {
data <- data[-sampling_rows[which(is_na)], ]
}
}
}
if (nrow(data)) {
## Check for CI bounds (require both) for S ----
sampling_rows <- which(data[ , "stat_method"] == "S")
if (length(sampling_rows)) {
if ("lower_ci_orig" %in% colnames(data)) {
is_na_lower <- is.na(data[sampling_rows, "lower_ci_orig"])
is_na_upper <- is.na(data[sampling_rows, "upper_ci_orig"])
pos <- which((is_na_upper + is_na_lower) == 1)
if (length(pos)) {
ci_cols <- which(precision_cols %in% c("lower_ci_orig",
"upper_ci_orig"))
tmp <- as.data.frame(data[sampling_rows[pos],
precision_cols[-ci_cols]])
if (ncol(tmp)) {
is_na <- apply(tmp, 1, function(x) {
x <- sum(ifelse(is.na(x), 0, 1))
ifelse(x == 0, TRUE, FALSE)
})
if (sum(is_na)) {
stop("Unless another type of precision information is provided, ",
"both lower and upper CI bounds are required.")
}
} else {
stop("Unless another type of precision information is provided, ",
"both lower and upper CI bounds are required.")
}
}
}
}
## Check CI bounds values for S ----
if (length(sampling_rows)) {
if ("lower_ci_orig" %in% colnames(data)) {
pos <- which(data[sampling_rows, "lower_ci_orig"] >
data[sampling_rows, "count_orig"])
if (length(pos)) {
stop("At least one CI lower bound is greater than the corresponding ",
"count.")
}
pos <- which(data[sampling_rows, "lower_ci_orig"] < 0)
if (length(pos)) {
stop("CI lower bounds must be positive.")
}
}
if ("upper_ci_orig" %in% colnames(data)) {
pos <- which(data[sampling_rows, "upper_ci_orig"] <
data[sampling_rows, "count_orig"])
if (length(pos)) {
stop("At least one CI upper bound is smaller than the corresponding ",
"count.")
}
pos <- which(data[sampling_rows, "upper_ci_orig"] < 0)
if (length(pos)) {
stop("Upper CI values must be positive.")
}
}
}
}
data
}
#' Compute 95% confident interval boundaries
#'
#' - Always computes CI bounds for stat methods T and G
#' - Derives CI bounds for stat method S from SD, VAR, or CV, unless CI bounds
#' are provided
#'
#' @param data a `data.frame`
#' @param precision_cols a `character` vector (names of columns with precision
#' information, i.e. `'lower_ci_orig'`, `'sd_orig'`, etc.)
#'
#' @return A `data.frame` identical to `data` with two additional columns:
#' `lower_ci_conv` and `upper_ci_conv`.
#'
#' @noRd
compute_ci <- function(data, precision_cols) {
data$"count_conv" <- data$"count_orig"
data$"lower_ci_conv" <- NA
data$"upper_ci_conv" <- NA
## Compute CI boundaries for Total counts ----
pos <- which(data$"stat_method" == "T")
if (length(pos)) {
data[pos, "lower_ci_conv"] <- data[pos, "count_orig"] * 0.95
data[pos, "upper_ci_conv"] <- data[pos, "count_orig"] * 1.20
}
## Compute CI boundaries for Guesstimates ----
pos <- which(data$"stat_method" == "X")
if (length(pos)) {
data[pos, "lower_ci_conv"] <- data[pos, "count_orig"] * 0.80
data[pos, "upper_ci_conv"] <- data[pos, "count_orig"] * 1.20
}
## Compute CI boundaries for Sampling counts ----
pos <- which(data$"stat_method" == "S")
if (length(pos)) {
for (i in pos) {
found <- 0
if (found == 0) {
if ("lower_ci_orig" %in% precision_cols) {
if (!is.na(data[i, "lower_ci_orig"])) { # CI bounds
data[i, "lower_ci_conv"] <- data[i, "lower_ci_orig"]
data[i, "upper_ci_conv"] <- data[i, "upper_ci_orig"]
found <- 1
}
}
}
if (found == 0) {
if ("sd_orig" %in% precision_cols) {
if (!is.na(data[i, "sd_orig"])) { # Standard deviation
data[i, "lower_ci_conv"] <-
data[i, "count_orig"] - 1.96 * data[i, "sd_orig"]
data[i, "upper_ci_conv"] <-
data[i, "count_orig"] + 1.96 * data[i, "sd_orig"]
found <- 1
}
}
}
if (found == 0) {
if ("var_orig" %in% precision_cols) {
if (!is.na(data[i, "var_orig"])) { # Variance
data[i, "lower_ci_conv"] <-
data[i, "count_orig"] - 1.96 * sqrt(data[i, "var_orig"])
data[i, "upper_ci_conv"] <-
data[i, "count_orig"] + 1.96 * sqrt(data[i, "var_orig"])
found <- 1
}
}
}
if (found == 0) {
if ("cv_orig" %in% precision_cols) {
if (!is.na(data[i, "cv_orig"])) { # Coeff of variation
data[i, "lower_ci_conv"] <-
data[i, "count_orig"] * (1 - 1.96 * data[i, "cv_orig"])
data[i, "upper_ci_conv"] <-
data[i, "count_orig"] * (1 + 1.96 * data[i, "cv_orig"])
found <- 1
}
}
}
}
}
data
}
#' Convert counts
#'
#' This function converts counts (and 95% CI lower and upper bounds) based on
#' the preferred field method and the species conversion factor.
#'
#' **Important:** if the preferred field method is provided and the
#' `field_method` column is present in `data`, counts are always converted
#' toward the preferred field method.
#'
#' @param data a `data.frame`. Counts dataset.
#'
#' @param field_method a `character` string. The column name in `data`.
#'
#' @return A `data.frame` (same as `data`).
#'
#' @noRd
convert_counts <- function(data, field_method) {
if (!is.null(field_method)) {
method_pref <- unique(data$"pref_field_method")
conv_fact <- unique(data$"conversion_A2G")
methods_used <- data$"field_method"
# for guesstimates (only)
methods_used <- ifelse(is.na(methods_used), method_pref, methods_used)
conv_fact <- ifelse(method_pref == "A", 1 / conv_fact, conv_fact)
conv_fact <- ifelse(methods_used == method_pref, 1, conv_fact)
data[ , "count_conv"] <-
data[ , "count_conv"] * conv_fact
data[ , "lower_ci_conv"] <-
data[ , "lower_ci_conv"] * conv_fact
data[ , "upper_ci_conv"] <-
data[ , "upper_ci_conv"] * conv_fact
data[ , "field_method_conv"] <- method_pref
}
data
}
#' Special cases: zero counts
#'
#' For a count series, identify zero counts.
#'
#' If there are only zero counts, returns an error (`na_rm = FALSE`) or delete
#' series (`na_rm = TRUE`).
#'
#' If there are zero and non-zero counts, replaces 0 counts by the smaller
#' non-zero count (and replaces `lower_ci_conv` and `upper_ci_conv` by
#' the corresponding values).
#'
#' @param data a `data.frame`
#'
#' @param na_rm a `logical`. If `TRUE` delete series with all 0 counts.
#' Otherwise, return an error.
#'
#' @return A `data.frame` (same as `data`).
#'
#' @noRd
zero_counts <- function(data, na_rm) {
pos <- which(data[ , "count_conv"] == 0)
if (length(pos)) {
if (length(pos) == nrow(data)) {
if (!na_rm) {
stop("Some series have only zero counts. Check your data or ",
"'use na_rm = TRUE'.")
} else {
return(data[-pos, ])
}
} else {
non_zero_counts <- data[-pos, ]
which_min_counts <- which.min(non_zero_counts[ , "count_conv"])[1]
data[pos, "count_conv"] <- non_zero_counts[which_min_counts,
"count_conv"]
data[pos, "lower_ci_conv"] <- non_zero_counts[which_min_counts,
"lower_ci_conv"]
data[pos, "upper_ci_conv"] <- non_zero_counts[which_min_counts,
"upper_ci_conv"]
}
}
data
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.