R/galah-package.R

#' Biodiversity Data from the GBIF Node Network
#'
#' @description
#' The Global Biodiversity Information Facility (GBIF; <https://www.gbif.org>)
#' provides tools to enable users to find, access, combine and visualise 
#' biodiversity data. `galah` is a `dplyr` extension package that enables the R 
#' community to directly access data and resources hosted by GBIF and several 
#' of it's subsidiary organisations (known as 'nodes') using `dplyr` verbs.
#' 
#' The basic unit of data stored by these infrastructures is 
#' an **occurrence** record, which is an observation of a biological entity at
#' a specific time and place. However, `galah` also facilitates access to 
#' taxonomic information, or associated media such images or sounds, 
#' all while restricting their queries to particular taxa or locations. Users 
#' can specify which columns are returned by a query, or restrict their results 
#' to observations that meet particular quality-control criteria.  
#' 
#' For those outside Australia, 'galah' is the common name of
#' *Eolophus roseicapilla*, a widely-distributed Australian bird species.
#' @name galah
#' @docType package
#' @section Functions:
#' 
#' **Getting Started**
#' 
#'   * [galah_config()] Set package configuration options
#'   * [galah_call()]/\code{\link[=request_data]{request_()}} Start to build a request
#'   
#' **Update a request object**
#' 
#'   * [apply_profile()] Restrict to data that pass predefined checks
#'   * \code{\link[=arrange.data_request]{arrange()}} Arrange rows of a query on the server side
#'   * [authenticate()] Authenticate your request via OAUTH in the browser
#'   * \code{\link[=count.data_request]{count()}} Request counts of the specified data type
#'   * \code{\link[=distinct.data_request]{distinct()}} Keep distinct/unique rows
#'   * \code{\link[=filter.data_request]{filter()}} Filter records (see also \code{\link[=filter_object_classes]{filter_object_classes}}))
#'   * [geolocate()] Spatial filtering of a query
#'   * \code{\link[=glimpse]{glimpse()}} Get a glimpse of your data
#'   * \code{\link[=group_by.data_request]{group_by()}} Group counts by one or more fields
#'   * \code{\link[=identify.data_request]{identify()}} Search for taxonomic identifiers (see also \code{\link[=taxonomic_searches]{taxonomic_searches}})
#'   * \code{\link[=select.data_request]{select()}} Fields to report information for
#'   * \code{\link[=slice_head.data_request]{slice_head()}} Choose the first n rows of a download
#'   * [unnest()] Expand metadata for `fields`, `lists`, `profiles` or `taxa`
#'
#' **Create and execute a query**
#' 
#'   * [capture()] Convert a request into a `prequery` or `query`
#'   * [compound()] Convert an object into a `query_set` showing all calls needed for evaluation
#'   * \code{\link[=collapse.data_request]{collapse()}} Convert an object to a valid `query` 
#'   * \code{\link[=compute.data_request]{compute()}} Compute a query
#'   * \code{\link[=collect.data_request]{collect()}} Retrieve a database query
#' 
#' **Wrappers for accessing data**
#' 
#'   * [show_all()] & [search_all()] Data for generating filter queries
#'   * [show_values()] & [search_values()] Show or search for values _within_ `fields`, `profiles`, `lists`, `collections`, `datasets` or `providers`
#'   * [atlas_occurrences()] Download occurrence data
#'   * [atlas_counts()] Get a summary of the number of records or species
#'   * [atlas_species()] Download occurrences grouped by `speciesID`
#'   * [atlas_taxonomy()] Download taxonomic trees
#'   * [atlas_media()] Download media metadata linked to occurrences
#'   * [collect_media()] Download media (images and sounds)
#' 
#' **Miscellaneous functions**
#' 
#'   * [atlas_citation()] Get a citation for a dataset
#'   * [read_zip()] To read data from an earlier download
#'   * \code{\link[=print.data_request]{print()}} Print functions for galah objects
#'
#' @section Terminology:
#'
#' To get the most value from `galah`, it is helpful to understand some
#' terminology. Each occurrence record contains taxonomic
#' information, and usually some information about the observation itself, such
#' as its location. In addition to this record-specific information, the living 
#' atlases append contextual information to each record, particularly data from 
#' spatial **layers** reflecting climate gradients or political boundaries. They
#' also run a number of quality checks against each record, resulting in
#' **assertions** attached to the record. Each piece of information
#' associated with a given occurrence record is stored in a **field**,
#' which corresponds to a **column** when imported to an
#' `tibble`. See `show_all(fields)` to view valid fields,
#' layers and assertions, or conduct a search using `search_all(fields)`.
#'
#' Data fields are important because they provide a means to **filter**
#' occurrence records;  i.e. to return only the information that you need, and
#' no more. Consequently, much of the architecture of `galah` has been
#' designed to make filtering as simple as possible. The easiest way to do this 
#' is to start a pipe with [galah_call()] and follow it with the relevant 
#' `dplyr` function; starting with \code{\link[=filter.data_request]{filter()}},
#' but also including \code{\link[=select.data_request]{select()}},
#' \code{\link[=group_by.data_request]{group_by()}} or others. Functions without
#' a relevant `dplyr` synonym include 
#' \code{\link[=identify.data_request]{identify()}} for choosing a taxon, or 
#' [geolocate()] for choosing a specific location. By combining different filters, 
#' it is possible to build complex queries to return only the most valuable 
#' information for a given problem.
#'
#' A notable extension of the filtering approach is to remove records with low
#' 'quality'. All living atlases perform quality control checks on all records 
#' that they store. These checks are used to generate new fields, that can then 
#' be used to filter out records that are unsuitable for particular applications. 
#' However, there are many possible data quality checks, and it is not always 
#' clear which are most appropriate in a given instance. Therefore, `galah` 
#' supports data quality **profiles**, which can be passed to 
#' [apply_profile()] to quickly remove undesirable records. A full list of 
#' data quality profiles is returned by `show_all(profiles)`.
#'
#' @importFrom rlang caller_env
#' @importFrom rlang .data
#' @importFrom rlang `:=`
#' @keywords internal
"_PACKAGE"

#' @importFrom lifecycle badge
NULL

Try the galah package in your browser

Any scripts or data that you put into this service are public.

galah documentation built on Feb. 11, 2026, 9:11 a.m.