R/atlas_occurrences.R

Defines functions atlas_occurrences

Documented in atlas_occurrences

#' Collect a set of occurrences
#'
#' The most common form of data stored by living atlases are observations of
#' individual life forms, known as 'occurrences'. This function allows the
#' user to search for occurrence records that match their specific criteria,
#' and return them as a `tibble` for analysis. Optionally,
#' the user can also request a DOI for a given download to facilitate citation
#' and re-use of specific data resources.
#'
#' @param request optional `data_request` object: generated by a call to
#' [galah_call()].
#' @param identify `data.frame`: generated by a call to
#' [galah_identify()].
#' @param filter `data.frame`: generated by a call to
#' [galah_filter()]
#' @param geolocate `string`: generated by a call to
#' [galah_geolocate()]
#' @param data_profile `string`: generated by a call to
#' [galah_apply_profile()]
#' @param select `data.frame`: generated by a call to
#' [galah_select()] 
#' @param mint_doi `logical`: by default no DOI will be generated. Set to
#' `TRUE` if you intend to use the data in a publication or similar.
#' @param doi `string`: (optional) DOI to download. If provided overrides
#' all other arguments. Only available for the ALA.
#' @param file `string`: (Optional) file name. If not given, will be set to 
#' `data` with date and time added. The file path (directory) is always given by 
#' `galah_config()$package$directory`. 
#' @details
#' Note that unless care is taken, some queries can be particularly large.
#' While most cases this will simply take a long time to process, if the number
#' of requested records is >50 million, the call will not return any data. Users
#' can test whether this threshold will be reached by first calling
#' [atlas_counts()] using the same arguments that they intend to pass to
#' `atlas_occurrences()`. It may also be beneficial when requesting a large
#' number of records to show a progress bar by setting `verbose = TRUE` in
#' [galah_config()], or to use `compute()` to run the call before collecting
#' it later with `collect()`.
#' @return An object of class `tbl_df` and `data.frame` (aka a tibble) of 
#' occurrences, containing columns as specified by [galah_select()]. 
#' @examples \dontrun{
#' # Download occurrence records for a specific taxon
#' galah_config(email = "your_email_here")
#' galah_call() |>
#'   galah_identify("Reptilia") |>
#'   atlas_occurrences()
#'
#' # Download occurrence records in a year range
#' galah_call() |>
#'   galah_identify("Litoria") |>
#'   galah_filter(year >= 2010 & year <= 2020) |>
#'   atlas_occurrences()
#'   
#' # Or identically with alternative syntax
#' request_data() |>
#'   identify("Litoria") |>
#'   filter(year >= 2010 & year <= 2020) |>
#'   collect()
#'   
#' # Download occurrences records in a WKT-specified area
#' polygon <- "POLYGON((146.24960 -34.05930,
#'                      146.37045 -34.05930,
#'                      146.37045 -34.152549,
#'                      146.24960 -34.15254,
#'                      146.24960 -34.05930))"
#' galah_call() |> 
#'   galah_identify("Reptilia") |>
#'   galah_filter(year >= 2010, year <= 2020) |>
#'   galah_geolocate(polygon) |>
#'   atlas_occurrences()
#'   
#' }
#' @export
atlas_occurrences <- function(request = NULL,
                              identify = NULL,
                              filter = NULL,
                              geolocate = NULL,
                              data_profile = NULL,
                              select = NULL,
                              mint_doi = FALSE,
                              doi = NULL,
                              file = NULL
                              ) {
  if(!is.null(doi)){
    request_data() |>
      filter(doi == doi) |>
      collect(file = file)
  }else{
    args <- as.list(environment()) # capture supplied arguments
    check_atlas_inputs(args) |> # convert to `data_request` object
      collect(wait = TRUE,
              file = file)
  }
}

Try the galah package in your browser

Any scripts or data that you put into this service are public.

galah documentation built on Nov. 20, 2023, 9:07 a.m.