R/load.R
In ebirdst: Access and Analyze eBird Status and Trends Data Products

Documented in list_available_pis load_config load_fac_map_parameters load_pi load_ppm load_ranges load_raster load_regional_stats load_trends

#' Load eBird Status Data Products raster data
#'
#' Each of the eBird Status raster products is packaged as a GeoTIFF file
#' representing predictions on a regular grid. The core products are occurrence,
#' count, relative abundance, and proportion of population. This function loads
#' one of the available data products into R as a
#' [SpatRaster][terra::SpatRaster] object. Note that data must be download using
#' [ebirdst_download_status()] prior to loading it using this function.
#'
#' @param species character; the species to load data for, given as a scientific
#'   name, common name or six-letter species code (e.g. "woothr"). The full list
#'   of valid species is in the [ebirdst_runs] data frame included in this
#'   package. To download the example dataset, use `"yebsap-example"`.
#' @param product character; eBird Status raster product to load: occurrence,
#'   count, relative abundance, or proportion of population. See Details for a
#'   detailed explanation of each of these products.
#' @param period character; temporal period of the estimation. The eBird Status
#'   models make predictions for each week of the year; however, as a
#'   convenience, data are also provided summarized at the seasonal or annual
#'   ("full-year") level.
#' @param metric character; by default, the weekly products provide estimates of
#'   the median value (`metric = "median"`) and the summarized products are the
#'   cell-wise mean across the weeks within the season (`metric = "mean"`).
#'   However, additional variants exist for some of the products. For the weekly
#'   relative abundance, confidence intervals are provided: specify `metric =
#'   "lower"` to get the 10th quantile or `metric = "upper"` to get the 90th
#'   quantile. For the seasonal and annual products, the cell-wise maximum
#'   values across weeks can be obtained with `metric = "max"`.
#' @param resolution character; the resolution of the raster data to load. The
#'   default is to load the native 3 km resolution data; however, for some
#'   applications 9 km or 27 km data may be suitable.
#' @inheritParams ebirdst_download_status
#'
#' @details The core eBird Status data products provide weekly estimates across
#'   a regular spatial grid. They are packaged as rasters with 52 layers, each
#'   corresponding to estimates for a week of the year, and we refer to them as
#'   "cubes" (e.g. the "relative abundance cube"). All estimates are the median
#'   expected value for a standard 2 km, 1 hour eBird Traveling Count by an
#'   expert eBird observer at the optimal time of day and for optimal weather
#'   conditions to observe the given species. These products are:
#'
#' - `occurrence`: the expected probability (0-1) of occurrence of a species.
#' - `count`: the expected count of a species, conditional on its occurrence at
#' the given location.
#' - `abundance`: the expected relative abundance of a species, computed as the
#' product of the probability of occurrence and the count conditional on
#' occurrence.
#' - `proportion-population`: the proportion of the total relative abundance
#' within each cell. This is a derived product calculated by dividing each cell
#' value in the relative abundance raster by the total abundance summed across
#' all cells.
#'
#' In addition to these weekly data cubes, this function provides access to data
#' summarized over different periods. Seasonal cubes are produced by taking the
#' cell-wise mean or max across the weeks within each season. The boundary dates
#' for each season are species specific and are available in `ebirdst_runs`, and
#' if a season failed review no associated layer will be included in the cube.
#' In addition, full-year summaries provide the mean or max across all weeks of
#' the year that fall within a season that passed review. Note that this is not
#' necessarily all 52 weeks of the year. For example, if the estimates for the
#' non-breeding season failed expert review for a given species, the full-year
#' summary for that species will not include the weeks that would fall within
#' the non-breeding season.
#'
#' @return For the weekly cubes, a [SpatRaster][terra::SpatRaster] with 52
#'   layers for the given product, where the layer names are the dates
#'   (`YYYY-MM-DD` format) of the midpoint of each week. Seasonal cubes will
#'   have up to four layers named with the corresponding season. The full-year
#'   products will have a single layer.
#'
#' @export
#'
#' @examples
#' \dontrun{
#' # download example data if hasn't already been downloaded
#' ebirdst_download("yebsap-example")
#'
#' # weekly relative abundance
#' # note that only 27 km data are available for the example data
#' abd_weekly <- load_raster("yebsap-example", "abundance", resolution = "27km")
#'
#' # the weeks for each layer are stored in the layer names
#' names(abd_weekly)
#' # they can be converted to date objects with as.Date
#' as.Date(names(abd_weekly))
#'
#' # max seasonal abundance
#' abd_seasonal <- load_raster("yebsap-example", "abundance",
#'                             period = "seasonal", metric = "max",
#'                             resolution = "27km")
#' # available seasons in stack
#' names(abd_seasonal)
#' # subset to just breeding season abundance
#' abd_seasonal[["breeding"]]
#' }
load_raster <- function(species,
                        product = c("abundance",
                                    "count",
                                    "occurrence",
                                    "proportion-population"),
                        period = c("weekly",
                                   "seasonal",
                                   "full-year"),
                        metric = NULL,
                        resolution = c("3km", "9km", "27km"),
                        path = ebirdst_data_dir()) {

  stopifnot(is.character(species), length(species) == 1, dir.exists(path))
  product <- match.arg(product)
  period <- match.arg(period)
  resolution <- match.arg(resolution)

  species_code <- get_species(species)
  species_path <- get_species_path(species, path = path,
                                   check_downloaded = FALSE)
  if (!dir.exists(species_path)) {
    stop("No data found for the requested species. Ensure that the data were ",
         "downloaded using ebirdst_download_status() and that the 'path' ",
         "argument correctly points to the data download directory.")
  }

  # check that the geotiff driver is installed
  drv <- terra::gdal(drivers = TRUE)
  drv <- drv$name[stringr::str_detect(drv$can, "read")]
  if (!"GTiff" %in% drv) {
    stop("GDAL does not have GeoTIFF support. GeoTIFF support is required to ",
         "load Status and Trends raster data.")
  }

  # load config file
  p <- load_config(species = species_code, path = path)
  v <- p$srd_pred_year

  # only low res data available for example
  is_example <- stringr::str_detect(species_code, "-example")
  if (is_example && !resolution == "27km") {
    stop("The example data only contains 27 km estimates.")
  }

  # full year products only available for migrants
  if (p$is_resident && period == "full-year") {
    stop("Full-year products are not available for residents, use ",
         "period = 'seasonal' instead.")
  }

  # construct file name and path
  if (period == "weekly") {
    # assess which metric is being requested
    if (is.null(metric)) {
      metric <- "median"
    }
    if (product == "abundance") {
      if (!metric %in% c("median", "lower", "upper")) {
        stop("Valid metrics for weekly abundance data are 'median', 'lower', ",
             "or 'upper'")
      }
    } else {
      if (metric != "median") {
        stop("For this product, metric must be 'median'")
      }
    }

    # construct filename
    file <- stringr::str_glue("{species_code}_{product}_{metric}",
                              "_{resolution}_{v}.tif")
    file <- file.path(species_path, "weekly", file)
  } else {
    # assess which metric is being requested
    if (is.null(metric)) {
      metric <- "mean"
    }
    if (!metric %in% c("mean", "max")) {
      stop("Valid metrics for seasonal or full-year data are 'mean' or 'max.'")
    }

    # construct filename
    file <- stringr::str_glue("{species_code}_{product}_{period}_{metric}",
                              "_{resolution}_{v}.tif")
    file <- file.path(species_path, "seasonal", file)
  }

  # check existence of target file
  if (!file.exists(file)) {
    stop("The file for the requested product does not exist: \n  ", file)
  }

  # load and return raster stack
  return(terra::rast(file))
}


#' Load eBird Trends estimates for a set of species
#'
#' Load the relative abundance trend estimates for a single species or a set of
#' species. Trends are estimated on a 27 km by 27 km grid for a single season
#' per species (breeding, non-breeding, or resident).  Note that data must be
#' download using [ebirdst_download_trends()] prior to loading it using this
#' function.
#'
#' The trends in relative abundance are estimated using a double machine
#' learning model. To quantify uncertainty, an ensemble of 100 estimates is made
#' at each location, each based on a random subsample of eBird data. The
#' estimated trend is the median across the ensemble, and the 80% confidence
#' intervals are the lower 10th and upper 90th percentiles across the ensemble.
#' To access estimates from the individual folds making up the ensemble use
#' `fold_estimates = TRUE`. These fold-level estimates can be used to quantify
#' uncertainty, for example, when calculating the trend for a given region. For
#' further details on the methodology used to estimate trends consult Fink et
#' al. 2023.
#'
#' @inheritParams ebirdst_download_trends
#' @param fold_estimates logical; by default, the trends summarized across the
#'   100-fold ensemble are returned; however, by setting `fold_estimates = TRUE`
#'   the individual fold-level estimates are returned.
#'
#' @references
#'   Fink, D., Johnston, A., Strimas-Mackey, M., Auer, T., Hochachka, W. M.,
#'   Ligocki, S., Oldham Jaromczyk, L., Robinson, O., Wood, C., Kelling, S., &
#'   Rodewald, A. D. (2023). A Double machine learning trend model for citizen
#'   science data. Methods in Ecology and Evolution, 00, 1–14.
#'   https://doi.org/10.1111/2041-210X.14186
#'
#' @return A data frame containing the trends estimates for a set of species.
#'   The following columns are included:
#'   - `species_code`: the alphanumeric eBird species code uniquely identifying
#'   the species.
#'   - `season`:  season that the trend was estimated for: breeding,
#'   nonbreeding, or resident.
#'   - `start_year/end_year`: the start and end years of the trend time period.
#'   - `start_date/end_date`: the start and end dates (`MM-DD` format) of the
#'   season for which the trend was estimated.
#'   - `srd_id`: unique integer identifier for the grid cell.
#'   - `longitude/latitude`: longitude and latitude of the grid cell center.
#'   - `abd`: relative abundance estimate for the middle of the trend time
#'   period (e.g. 2014 for a 2007-2021 trend).
#'   - `abd_ppy`: the median estimated percent per year change in relative
#'   abundance.
#'   - `abd_ppy_lower/abd_ppy_upper`: the 80% confidence interval for the
#'   estimated percent per year change in relative abundance.
#'   - `abd_ppy_nonzero`: a logical (TRUE/FALSE) value indicating if the 80%
#'   confidence limits overlap zero (FALSE) or don't overlap zero (TRUE)
#'   - `abd_trend`: the median estimated cumulative change in relative
#'   abundance over the trend time period.
#'   - `abd_trend_lower/abd_trend_upper`: the 80% confidence interval for the
#'   estimated cumulative change in relative abundance over the trend time
#'   period.
#'
#'   If `fold_estimates = TRUE`, a data frame of fold-level trend estimates is
#'   returned with the following columns:
#'   - `species_code`: the alphanumeric eBird species code uniquely identifying
#'   the species.
#'   - `season`:  season that the trend was estimated for: breeding,
#'   nonbreeding, or resident.
#'   - `srd_id`: unique integer identifier for the grid cell.
#'   - `abd`: relative abundance estimate for the middle of the trend time
#'   period (e.g. 2014 for a 2007-2021 trend).
#'   - `abd_ppy`: the estimated percent per year change in relative abundance.
#'
#' @export
#'
#' @examples
#' \dontrun{
#' # download example trends data if it hasn't already been downloaded
#' ebirdst_download_trends("yebsap-example")
#'
#' # load trends
#' trends <- load_trends("yebsap-example")
#'
#' # load fold-level estimates
#' trends_folds <- load_trends("yebsap-example", fold_estimates = TRUE)
#' }
load_trends <- function(species,
                        fold_estimates = FALSE,
                        path = ebirdst_data_dir()) {
  stopifnot(is.character(species), !is.na(species), dir.exists(path))
  stopifnot(is_flag(fold_estimates))

  v <- ebirdst_version()[["version_year"]]

  # trends species and seaons
  species_code <- get_species(species)
  trends_runs <- ebirdst::ebirdst_runs[ebirdst::ebirdst_runs$has_trends, ]
  season <- trends_runs$trends_season[match(species_code,
                                            trends_runs$species_code)]
  if (any(is.na(season))) {
    stop("The following species do not have trends estimates:\n  ",
         paste(species[is.na(season)], collapse = ", "))
  }

  # get paths to trends parquet files
  trends_paths <- character()
  for (i in seq_along(species_code)) {
    p <- get_species_path(species_code[i],
                          path = path,
                          check_downloaded = FALSE)
    if (fold_estimates) {
      f <- stringr::str_glue("{species_code[i]}_{season[i]}_ebird-trends_",
                             "folds_{v}.parquet")
    } else {
      f <- stringr::str_glue("{species_code[i]}_{season[i]}_ebird-trends_",
                             "{v}.parquet")
    }
    trends_paths <- c(trends_paths, file.path(p, "trends", f))
  }

  if (!all(file.exists(trends_paths))) {
    stop("No trends data found for the following species. Ensure that the ",
         "data were downloaded using ebirdst_download_trends() and that the ",
         "'path' argument correctly points to the data download directory.\n  ",
         paste(species[file.exists(trends_paths)], collapse = ", "))
  }

  # load data
  return(dplyr::collect(arrow::open_dataset(trends_paths)))
}


#' Load seasonal eBird Status and Trends range polygons
#'
#' Range polygons are defined as the boundaries of non-zero seasonal relative
#' abundance estimates, which are then (optionally) smoothed to produce more
#' aesthetically pleasing polygons using the `smoothr` package.
#'
#' @inheritParams load_raster
#' @param resolution character; the raster resolution from which the range
#'   polygons were derived.
#' @param smoothed logical; whether smoothed or unsmoothed ranges should be
#'   loaded.
#'
#' @return An `sf` update containing the seasonal range boundaries, with each
#'   season provided as a different feature.
#' @export
#'
#' @examples
#' \dontrun{
#' # download example data if hasn't already been downloaded
#' ebirdst_download_status("yebsap-example")
#'
#' # load smoothed ranges
#' # note that only 27 km data are provided for the example data
#' ranges <- load_ranges("yebsap-example", resolution = "27km")
#' }
load_ranges <- function(species,
                        resolution = c("9km", "27km"), smoothed = TRUE,
                        path = ebirdst_data_dir()) {
  stopifnot(is.character(species), length(species) == 1, dir.exists(path))
  stopifnot(is.logical(smoothed), length(smoothed) == 1)
  resolution <- match.arg(resolution)

  species_code <- get_species(species)
  species_path <- get_species_path(species, path = path,
                                   check_downloaded = FALSE)
  if (!dir.exists(species_path)) {
    stop("No data found for the requested species. Ensure that the data were ",
         "downloaded using ebirdst_download_status() and that the 'path' ",
         "argument correctly points to the data download directory.")
  }

  # load config file
  p <- load_config(species = species_code, path = path)
  v <- p$srd_pred_year

  # only low res data available for example
  is_example <- stringr::str_detect(species_code, "-example")
  if (is_example && !resolution == "27km") {
    stop("The example data only contains 27 km estimates.")
  }

  # define filename
  label <- ifelse(smoothed, "smooth", "raw")
  file <- stringr::str_glue("{species_code}_range_{label}",
                            "_{resolution}_{v}.gpkg")
  file <- file.path(species_path, "ranges", file)

  # check existence of target file
  if (!file.exists(file)) {
    stop("The file for the requested product does not exist: \n  ", file)
  }

  # load polygons
  p <- sf::read_sf(dsn = file, layer = "range")

  return(p)
}


#' Load regional summary statistics
#'
#' Load seasonal summary statistics for regions consisting of countries and
#' states/provinces.
#'
#' @inheritParams load_raster
#'
#' @return A data frame containing regional summary statistics with columns:
#'   - `species_code`: alphanumeric eBird species code.
#'   - `region_type`: `country` for countries or `state` for states, provinces,
#'   or other sub-national regions.
#'   - `region_code`: alphanumeric code for the region.
#'   - `region_name`: English name of the region.
#'   - `season`: name of the season that the summary statistics were calculated
#'   for.
#'   - `abundance_mean`: mean relative abundance in the region.
#'   - `total_pop_percent`: proportion of the seasonal modeled population
#'   falling within the region.
#'   - `range_percent_occupied`: the proportion of the region occupied by the
#'   species during the given season.
#'   - `range_total_percent`: the proportion of the species seasonal range
#'   falling within the region.
#'   - `range_days_occupation`: number of days of the season that the region was
#'   occupied by this species.
#' @export
#'
#' @examples
#' \dontrun{
#' # download example data if hasn't already been downloaded
#' ebirdst_download_status("yebsap-example")
#'
#' # load configuration parameters
#' regional <- load_regional_stats("yebsap-example")
#' }
load_regional_stats <- function(species, path = ebirdst_data_dir()) {
  stopifnot(is.character(species), length(species) == 1, dir.exists(path))

  species_code <- get_species(species)
  species_path <- get_species_path(species, path = path,
                                   check_downloaded = FALSE)
  if (!dir.exists(species_path)) {
    stop("No data found for the requested species. Ensure that the data were ",
         "downloaded using ebirdst_download_status() and that the 'path' ",
         "argument correctly points to the data download directory.")
  }

  file <- file.path(species_path, "regional_stats.csv")
  if(!file.exists(file)) {
    stop("The regional summary stats file could not be found. To download ",
         "file, use `ebirst_download_status(download_regional = TRUE)`.")
  }
  # load stats
  stats <- dplyr::as_tibble(utils::read.csv(file, na = "", row.names = NULL))
  stats[["region_area_km2"]] <- NULL
  return(stats)
}


#' Load eBird Status Data Products configuration file
#'
#' Load the configuration file for an eBird Status run. This configuration file
#' is mostly for internal use and contains a variety of parameters used in the
#' modeling process.
#'
#' @inheritParams load_raster
#'
#' @return A list with the run configuration parameters.
#' @export
#'
#' @examples
#' \dontrun{
#' # download example data if hasn't already been downloaded
#' ebirdst_download_status("yebsap-example")
#'
#' # load configuration parameters
#' p <- load_config("yebsap-example")
#' }
load_config <- function(species, path = ebirdst_data_dir()) {
  stopifnot(is.character(species), length(species) == 1, dir.exists(path))

  species_code <- get_species(species)
  species_path <- get_species_path(species, path = path,
                                   check_downloaded = FALSE)
  if (!dir.exists(species_path)) {
    stop("No data found for the requested species. Ensure that the data were ",
         "downloaded using ebirdst_download_status() and that the 'path' ",
         "argument correctly points to the data download directory.")
  }

  cfg_file <- file.path(species_path, "config.json")
  if(!file.exists(cfg_file)) {
    stop("The file 'config.json' does not exist in: ", species_path)
  }
  # load configuration file
  p <- jsonlite::read_json(cfg_file, simplifyVector = TRUE)
  names(p) <- tolower(names(p))
  return(p)
}


#' Load full annual cycle map parameters
#'
#' Get the map parameters used on the eBird Status and Trends website to
#' optimally display the full annual cycle data. This includes bins for the
#' abundance data, a projection, and an extent to map. The extent is the spatial
#' extent of non-zero data across the full annual cycle and the projection is
#' optimized for this extent.
#'
#' @inheritParams load_raster
#'
#' @return A list containing elements:
#' - `custom_projection`: a custom projection optimized for the given species'
#'    full annual cycle
#' - `fa_extent`: a [SpatExtent][terra::ext()] object storing the spatial
#' extent of non-zero
#' data for the given species in the custom projection
#' - `res`: a numeric vector with 2 elements giving the target resolution of
#'    raster in the custom projection
#' - `fa_extent_sinu`: the extent in sinusoidal projection
#' - `weekly_bins`/`weekly_labels`: weekly abundance bins and labels for the
#' full annual cycle
#' - `seasonal_bins`/`seasonal_labels: seasonal abundance bins and labels for
#' the full annual cycle
#'
#' @export
#'
#' @examples
#' \dontrun{
#' # download example data if hasn't already been downloaded
#' ebirdst_download_status("yebsap-example")
#'
#' # load configuration parameters
#' load_fac_map_parameters(path)
#' }
load_fac_map_parameters <- function(species, path = ebirdst_data_dir()) {
  stopifnot(is.character(species), length(species) == 1, dir.exists(path))

  species_code <- get_species(species)
  species_path <- get_species_path(species, path = path,
                                   check_downloaded = FALSE)
  if (!dir.exists(species_path)) {
    stop("No data found for the requested species. Ensure that the data were ",
         "downloaded using ebirdst_download_status() and that the 'path' ",
         "argument correctly points to the data download directory.")
  }

  # load config file
  p <- load_config(species = species_code, path = path)
  ext_order <- unlist(p$bbox_sinu)[c("xmin", "xmax", "ymin", "ymax")]

  seasonal_bins <- list(custom_projection = p$projection$crs,
                        fa_extent = terra::ext(p$projection$extent),
                        res = p$projection$res,
                        fa_extent_sinu = terra::ext(ext_order),
                        weekly_bins = p$bins[["3km"]]$breaks,
                        weekly_labels = p$bins[["3km"]]$labels,
                        seasonal_bins = p$bins_seasonal[["3km"]]$breaks,
                        seasonal_labels = p$bins_seasonal[["3km"]]$labels)
}


#' Load predictor importance (PI) rasters
#'
#' The eBird Status models estimate the relative importance of each
#' environmental predictor used in the model. These predictor importance (PI)
#' data are converted to ranks (with a rank of 1 being the most important)
#' relative to the full suite of environmental predictors. The ranks are
#' summarized to a 27 km resolution raster grid for each predictor, where the
#' cell values are the average across all models in the ensemble contributing to
#' that cell. These data are available in raster format provided `download_pis =
#' TRUE` was used when calling [ebirdst_download_status()]. PI estimates are
#' available separately for both the occurrence and count sub-model and only the
#' 30 most important predictors are distributed. Use [list_available_pis()] to
#' see which predictors have PI data.
#'
#' @inheritParams load_raster
#' @param predictor character; the predictor that the PI data should be loaded
#'   for. The list of predictors that PI data are available for varies by
#'   species, use [list_available_pis()] to get the list for a given species.
#' @param response character; the model (occurrence or count) that the PI
#'   data should be loaded for.
#'
#' @return A [SpatRaster][terra::SpatRaster] object with the PI ranks for the
#'   given predictor. For migrants, the estimates are weekly and the raster will
#'   have 52 layers, where the layer names are the dates (`MM-DD` format) of the
#'   midpoint of each week. For residents, a single year round layer is
#'   returned.
#'
#' [list_available_pis()] returns a data frame listing the top 30 predictors for
#' which PI rasters can be loaded. In addition to the predictor names, the mean
#' range-wide rank (`rangewide_rank`) is given as well as the integer rank
#' (`rank`) relative to the other 29 predictors.
#'
#' @export
#'
#' @examples
#' \dontrun{
#' # download example data if hasn't already been downloaded
#' ebirdst_download_status("yebsap-example", download_pis = TRUE)
#'
#' # identify the top predictor
#' top_preds <- list_available_pis("yebsap-example")
#' print(top_preds[1, ])
#'
#' # load predictor importance raster of top predictor for occurrence
#' load_pi("yebsap-example", top_preds$predictor[1])
#' }
load_pi <- function(species, predictor, response = c("occurrence", "count"),
                    path = ebirdst_data_dir()) {
  stopifnot(is.character(species), length(species) == 1, dir.exists(path))
  response <- match.arg(response)

  species_code <- get_species(species)
  species_path <- get_species_path(species, path = path,
                                   check_downloaded = FALSE)
  if (!dir.exists(species_path)) {
    stop("No data found for the requested species. Ensure that the data were ",
         "downloaded using ebirdst_download_status() and that the 'path' ",
         "argument correctly points to the data download directory.")
  }

  # construct file name
  year <- load_config(species = species, path = path)[["srd_pred_year"]]
  p <- stringr::str_replace_all(predictor, "_", "-")
  tif <- stringr::str_glue("{species_code}_pi_{response}_{p}_27km_{year}.tif")
  tif <- file.path(species_path, "pis", tif)
  if (!file.exists(tif)) {
    stop("GeoTIFF for ", predictor, " PI could not be found. To download ",
         "PI data use ebirst_download_status(download_pis = TRUE). To list ",
         "predictors that have PI data use list_available_pis().")
  }
  return(terra::rast(tif))
}


#' @describeIn load_pi list the predictors that have PI information for this
#'   species.
#' @export
list_available_pis <- function(species, path = ebirdst_data_dir()) {
  stopifnot(is.character(species), length(species) == 1, dir.exists(path))

  species_code <- get_species(species)
  species_path <- get_species_path(species, path = path,
                                   check_downloaded = FALSE)
  if (!dir.exists(species_path)) {
    stop("No data found for the requested species. Ensure that the data were ",
         "downloaded using ebirdst_download_status() and that the 'path' ",
         "argument correctly points to the data download directory.")
  }

  csv_file <- file.path(species_path, "pis", "pi_rangewide-ranks.csv")
  if(!file.exists(csv_file)) {
    stop("The PI data could not be found. To download, use ",
         "`ebirst_download_status(download_pis = TRUE)`.")
  }
  # load ranks
  ranks <- utils::read.csv(csv_file, row.names = NULL, na = "")

  # available pis
  tifs <- list.files(file.path(species_path, "pis"), pattern = "*.tif")
  tifs <- tifs[!stringr::str_detect(tifs, "n-folds")]
  preds <- stringr::str_remove(tifs, "^[^_]+_pi_(occurrence|count)_")
  preds <- stringr::str_extract(preds, "[-a-z0-9]+")
  preds <- unique(stringr::str_replace_all(preds, "-", "_"))
  preds <- preds[preds %in% ranks$predictor]

  # return ranks
  top_ranks <- dplyr::as_tibble(ranks[ranks$predictor %in% preds, ])
  top_ranks <- dplyr::arrange(top_ranks, .data$rank)

  return(top_ranks)
}


#' Load predictive performance metric (PPM) rasters
#'
#' eBird Status models are evaluated against a test set of eBird data not used
#' during model training and a suite of predictive performance metrics (PPMs)
#' are calculated. The PPMs for each base model are summarized to a 27 km
#' resolution raster grid, where the cell values are the average across all
#' models in the ensemble contributing to that cell. These data are available in
#' raster format provided `download_ppms = TRUE` was used when calling
#' [ebirdst_download_status()].
#'
#' @inheritParams load_raster
#' @param ppm character; the name of a single metric to load data for. See
#'   Details for definitions of each metric.
#'
#' @details
#' Eight predictive performance metrics are provided:
#' - `binary_f1`: F1-score comparing the model predictions converted to binary
#' with the observed detection/non-detection for the test checklists.
#' - `binary_pr_auc`: the area on the precision-recall curve generated by
#' comparing the model predictions converted to binary with the observed
#' detection/non-detection for the test checklists.
#' - `occ_bernoulli_dev`: Bernoulli deviance comparing the predicted occurrence
#' with the observed detection/non-detection for the test checklists.
#' - `count_spearman`: Spearman's rank correlation coefficient comparing the
#' predicted count with the observed count for the subset of test checklists
#' on which the species was detected.
#' - `log_count_pearson`: Pearson correlation coefficient comparing the
#' logarithm of the predicted count with the logarithm of the observed count
#' for the subset of test checklists on which the species was detected.
#' - `abd_poisson_dev`: Poisson deviance comparing the predicted relative
#' abundance with the observed count for the full set of test checklists.
#' - `abd_spearman`: Spearman's rank  correlation coefficient comparing the
#' predicted relative abundance with the observed count for the full set of
#' test checklists.
#' - `log_abd_pearson`: Pearson correlation coefficient comparing the logarithm
#' of the predicted relative abundance with the logarithm of the observed
#' count for the full set of test checklists.
#'
#' @return A [SpatRaster][terra::SpatRaster] object with the PPM data. For
#'   migrants, rasters are weekly with  52 layers, where the layer names are the
#'   dates (`MM-DD` format) of the midpoint of each week. For residents, a
#'   single year round layer is returned.
#'
#' @export
#'
#' @examples
#' \dontrun{
#' # download example data if hasn't already been downloaded
#' ebirdst_download_status("yebsap-example", download_ppms = TRUE)
#'
#' # load area under the precision-recall curve PPM raster
#' load_ppm("yebsap-example", ppm = "binary_pr_auc")
#' }
load_ppm <- function(species,
                     ppm = c("binary_f1",
                             "binary_pr_auc",
                             "occ_bernoulli_dev",
                             "count_spearman",
                             "log_count_pearson",
                             "abd_poisson_dev",
                             "abd_spearman",
                             "log_abd_pearson"),
                     path = ebirdst_data_dir()) {
  stopifnot(is.character(species), length(species) == 1, dir.exists(path))
  ppm <- match.arg(ppm)

  species_code <- get_species(species)
  species_path <- get_species_path(species, path = path,
                                   check_downloaded = FALSE)
  if (!dir.exists(species_path)) {
    stop("No data found for the requested species. Ensure that the data were ",
         "downloaded using ebirdst_download_status() and that the 'path' ",
         "argument correctly points to the data download directory.")
  }

  # construct file name
  year <- load_config(species = species, path = path)[["srd_pred_year"]]
  p <- stringr::str_replace_all(ppm, "_", "-")
  tif <- stringr::str_glue("{species_code}_ppm_{p}_27km_{year}.tif")
  tif <- file.path(species_path, "ppms", tif)
  if (!file.exists(tif)) {
    stop("GeoTIFF for ", ppm, " PPM could not be found. To download ",
         "PPM data use ebirst_download_status(download_ppm = TRUE).")
  }
  return(terra::rast(tif))
}
Any scripts or data that you put into this service are public.
ebirdst documentation built on Nov. 16, 2023, 5:07 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
ebirdst
Access and Analyze eBird Status and Trends Data Products

R/load.R
In ebirdst: Access and Analyze eBird Status and Trends Data Products

Defines functions load_ppm list_available_pis load_pi load_fac_map_parameters load_config load_regional_stats load_ranges load_trends load_raster

Documented in list_available_pis load_config load_fac_map_parameters load_pi load_ppm load_ranges load_raster load_regional_stats load_trends

Try the ebirdst package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

ebirdst Access and Analyze eBird Status and Trends Data Products

R/load.R In ebirdst: Access and Analyze eBird Status and Trends Data Products

Defines functions load_ppm list_available_pis load_pi load_fac_map_parameters load_config load_regional_stats load_ranges load_trends load_raster

Documented in list_available_pis load_config load_fac_map_parameters load_pi load_ppm load_ranges load_raster load_regional_stats load_trends

Try the ebirdst package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

ebirdst
Access and Analyze eBird Status and Trends Data Products

R/load.R
In ebirdst: Access and Analyze eBird Status and Trends Data Products