R/atlas_counts.R

Defines functions count.data_request atlas_counts

Documented in atlas_counts count.data_request

#' Return a count of records
#'
#' Prior to downloading data it is often valuable to have some estimate of how
#' many records are available, both for deciding if the query is feasible,
#' and for estimating how long it will take to download. Alternatively, for some kinds
#' of reporting, the count of observations may be all that is required, for example
#' for understanding how observations are growing or shrinking in particular
#' locations, or for particular taxa. To this end, `atlas_counts()` takes
#' arguments in the same format as [atlas_occurrences()], and
#' provides either a total count of records matching the criteria, or a
#' `data.frame` of counts matching the criteria supplied to the `group_by`
#' argument.
#'
#' @param request optional `data_request` object: generated by a call to
#' [galah_call()].
#' @param identify `data.frame`: generated by a call to
#' [galah_identify()].
#' @param filter `data.frame`: generated by a call to
#' [galah_filter()]
#' @param geolocate `string`: generated by a call to
#' [galah_geolocate()]
#' @param data_profile `string`: generated by a call to
#' [galah_apply_profile()]
#' @param group_by `data.frame`: An object of class `galah_group_by`,
#' as returned by [galah_group_by()]. Alternatively a vector of field
#' names (see `search_all(fields)` and `show_all(fields)`.
#' @param limit `numeric`: maximum number of categories to return, defaulting to 100.
#' If limit is NULL, all results are returned. For some categories this will
#' take a while.
#' @param type `string`: one of `c("occurrences-count", "species-count")`. 
#' Defaults to `"occurrences-count"`, which returns the number of records
#' that match the selected criteria; alternatively returns the number of 
#' species. Formerly accepted arguments (`"records"` or `"species"`) are
#' deprecated but remain functional.
#' @return
#' An object of class `tbl_df` and `data.frame` (aka a tibble) returning: 
#'  * A single number, if `group_by` is not specified or,
#'  * A summary of counts grouped by field(s), if `group_by` is specified
#'
#' @examples \dontrun{
#' # classic syntax:
#' galah_call() |>
#'   galah_filter(year == 2015) |>
#'   atlas_counts()
#' 
#' # synonymous with:
#' request_data() |>
#'   filter(year == 2015) |>
#'   count() |>
#'   collect()
#' }
#' @export
atlas_counts <- function(request = NULL, 
                         identify = NULL, 
                         filter = NULL, 
                         geolocate = NULL,
                         data_profile = NULL,
                         group_by = NULL, 
                         limit = NULL,
                         type = c("occurrences", "species")
                         ) {
  # capture supplied arguments
  args <- as.list(environment())
  args$type <- match.arg(type)
  dr <- check_atlas_inputs(args) # convert to `data_request` object
  # check for outdated naming conventions
  if(dr$type == "record"){dr$type <- "occurrences"}
  # pass to collect etc
  dr |> 
    count() |>
    slice_head(n = limit) |>
    collect()
}

#' @rdname atlas_counts
#' @param x An object of class `data_request`, created using [galah_call()]
#' @param wt currently ignored
#' @param ... currently ignored
#' @param sort currently ignored
#' @param name currently ignored
#' @importFrom dplyr count
#' @export
count.data_request <- function(x, 
                               ..., 
                               wt, 
                               sort, 
                               name){
  x$type <- switch(x$type, 
         "occurrences" = "occurrences-count",
         "species" = "species-count",
         "media" = abort("type = 'media' is not supported by `count()`"),
         abort("`count()` only supports `type = 'occurrences' or` `'species'`"))
  x
}

Try the galah package in your browser

Any scripts or data that you put into this service are public.

galah documentation built on Nov. 20, 2023, 9:07 a.m.