galah: Biodiversity Data from the GBIF Node Network

galahR Documentation

Biodiversity Data from the GBIF Node Network

Description

The Global Biodiversity Information Facility (GBIF; https://www.gbif.org) provides tools to enable users to find, access, combine and visualise biodiversity data. galah is a dplyr extension package that enables the R community to directly access data and resources hosted by GBIF and several of it's subsidiary organisations (known as 'nodes') using dplyr verbs.

The basic unit of data stored by these infrastructures is an occurrence record, which is an observation of a biological entity at a specific time and place. However, galah also facilitates access to taxonomic information, or associated media such images or sounds, all while restricting their queries to particular taxa or locations. Users can specify which columns are returned by a query, or restrict their results to observations that meet particular quality-control criteria.

For those outside Australia, 'galah' is the common name of Eolophus roseicapilla, a widely-distributed Australian bird species.

Functions

Getting Started

  • galah_config() Set package configuration options

  • galah_call()/request_() Start to build a request

Update a request object

  • apply_profile() Restrict to data that pass predefined checks

  • arrange() Arrange rows of a query on the server side

  • authenticate() Authenticate your request via OAUTH in the browser

  • count() Request counts of the specified data type

  • distinct() Keep distinct/unique rows

  • filter() Filter records (see also filter_object_classes))

  • geolocate() Spatial filtering of a query

  • glimpse() Get a glimpse of your data

  • group_by() Group counts by one or more fields

  • identify() Search for taxonomic identifiers (see also taxonomic_searches)

  • select() Fields to report information for

  • slice_head() Choose the first n rows of a download

  • unnest() Expand metadata for fields, lists, profiles or taxa

Create and execute a query

  • capture() Convert a request into a prequery or query

  • compound() Convert an object into a query_set showing all calls needed for evaluation

  • collapse() Convert an object to a valid query

  • compute() Compute a query

  • collect() Retrieve a database query

Wrappers for accessing data

  • show_all() & search_all() Data for generating filter queries

  • show_values() & search_values() Show or search for values within fields, profiles, lists, collections, datasets or providers

  • atlas_occurrences() Download occurrence data

  • atlas_counts() Get a summary of the number of records or species

  • atlas_species() Download occurrences grouped by speciesID

  • atlas_taxonomy() Download taxonomic trees

  • atlas_media() Download media metadata linked to occurrences

  • collect_media() Download media (images and sounds)

Miscellaneous functions

  • atlas_citation() Get a citation for a dataset

  • read_zip() To read data from an earlier download

  • print() Print functions for galah objects

Terminology

To get the most value from galah, it is helpful to understand some terminology. Each occurrence record contains taxonomic information, and usually some information about the observation itself, such as its location. In addition to this record-specific information, the living atlases append contextual information to each record, particularly data from spatial layers reflecting climate gradients or political boundaries. They also run a number of quality checks against each record, resulting in assertions attached to the record. Each piece of information associated with a given occurrence record is stored in a field, which corresponds to a column when imported to an tibble. See show_all(fields) to view valid fields, layers and assertions, or conduct a search using search_all(fields).

Data fields are important because they provide a means to filter occurrence records; i.e. to return only the information that you need, and no more. Consequently, much of the architecture of galah has been designed to make filtering as simple as possible. The easiest way to do this is to start a pipe with galah_call() and follow it with the relevant dplyr function; starting with filter(), but also including select(), group_by() or others. Functions without a relevant dplyr synonym include identify() for choosing a taxon, or geolocate() for choosing a specific location. By combining different filters, it is possible to build complex queries to return only the most valuable information for a given problem.

A notable extension of the filtering approach is to remove records with low 'quality'. All living atlases perform quality control checks on all records that they store. These checks are used to generate new fields, that can then be used to filter out records that are unsuitable for particular applications. However, there are many possible data quality checks, and it is not always clear which are most appropriate in a given instance. Therefore, galah supports data quality profiles, which can be passed to apply_profile() to quickly remove undesirable records. A full list of data quality profiles is returned by show_all(profiles).

Author(s)

Maintainer: Martin Westgate martin.westgate@csiro.au

Authors:

Other contributors:

See Also

Useful links:


galah documentation built on Feb. 11, 2026, 9:11 a.m.