galah is an R interface to biodiversity data hosted by the Atlas of Living Australia (ALA). The ALA is a repository of biodiversity
data, focussed primarily on observations of individual life forms. Like the
Global Biodiversity Information Facility (GBIF), the
basic unit of data at ALA is an occurrence record, based on the 'Darwin Core' data standard.
galah enables users to locate and download species observations, taxonomic
information, or associated media such images or sounds, and to restrict their
queries to particular taxa or locations. Users can specify which columns are
returned by a query, or restrict their results to observations that meet
particular quality-control criteria. All functions return a
their standard format.
galah are designed according to a nested architecture. Users
that require data should begin by locating the relevant
ala_ function (see
downloading data section); the arguments within that
function then call correspondingly-named
and finally the specific values that can be interpreted by those
functions are given by
Install from CRAN:
Install the development version from GitHub:
See the README for system requirements.
Load the package
Each occurrence record contains taxonomic information, and also some
information about the observation itself, such as its location and the date
of the observation. Each piece of information associated with a
given occurrence is stored in a field, which corresponds to a column
when imported to a
Data fields are important because they provide a means to filter
occurrence records; i.e. to return only the information that you need, and
no more. Consequently, much of the architecture of
galah has been
designed to make filtering as simple as possible, by using functions with the
select_taxa() enables users search for taxonomic names and check the results
are 'correct' before using the result to download data.
The function allows both free-text searches and searches where the rank(s) are
specified. Specifying the rank can be useful when names are ambiguous.
# free text search taxa_filter <- select_taxa("Eolophus") # specifying ranks select_taxa(query = list(genus = "Eolophus", kingdom = "Aves"))
select_taxa() can optionally provide information about child concepts, and
counts of the number of records held by the ALA for the specified taxa.
select_taxa(query = "Eolophus", children = TRUE, counts = TRUE)
This shows that there is only one species in the family Eolophus.
Users can provide an
sf object or a Well-Known Text (WKT) string for
locations <- select_locations(query = st_read('act_rect.shp'))
As mentioned above, all occurrence records in the ALA contain additional
information about the record, stored in fields. Field-based filters are
select_filters(), which takes indvidual filters, in the form
field = value, and/or a data quality profile.
To find available fields and corresponding valid values, field lookup
functions are provided. For finding field names, use
finding valid field values, use
search_fields("basis") field_values <- find_field_values("basisOfRecord")
Build a field filter
filters <- select_filters(basisOfRecord = "HumanObservation")
By default, a filter is included. To negate a filter, use
filters <- select_filters(basisOfRecord = "HumanObservation", occurrenceStatus = exclude("absent"))
A notable extention of the filtering approach is to remove records with low
'quality'. ALA performs quality control checks on all records that it stores.
These checks are used to generate new fields, that can then be used to filter
out records that are unsuitable for particular applications. However, there
are many possible data quality checks, and it is not always clear which are
most appropriate in a given instance. Therefore,
galah supports ALA
data quality profiles, which can be passed to
remove undesirable records. A full list of data quality profiles is returned by
profiles <- find_profiles()
View filters included in a profile
Include a profile in the filters
filters <- select_filters(basisOfRecord = "HumanObservation", profile = "ALA")
Functions that return data from ALA are named with the prefix
followed by a suffix describing the information that they provide.
By combining different filter functions, it is possible to build complex
queries to return only the most valuable information for a given problem.
Once you have retrieved taxon information, you can use this to search for
occurrence records with
ala_occurrences(). However, it is
also possible to download data on species via
or media content (largely images) via
Alternatively, users can retrieve record counts using
In addition to the filter functions above, when downloading
occurrence data users can specify which columns are returned using
select_columns(). Individual column names and/or column groups can be
To view the fields for each group, see the documentation for
To view the list of available fields, run
cols <- select_columns("institutionID", group = "basic")
To download occurrence data you will need to specify your email in
ala_config(). This email must be associated with an active ALA account. See
more information in the config section
ala_config(email = "firstname.lastname@example.org")
ala_config(email = your_email_here, profile_path = path_to_profile)
occ <- read.csv("eolophus_roseicapilla.csv")
Download occurrence records for Eolophus roseicapilla
occ <- ala_occurrences(taxa = select_taxa("Eolophus roseicapilla"), filters = select_filters(stateProvince = "Australian Capital Territory", year = seq(2010, 2020), profile = "ALA"), columns = select_columns("institutionID", group = "basic"))
A common use case of the ALA is to identify which species occur in a specified
region, time period, or taxonomic group.
ala_species() enables the user to
look up this information, using the common set of filter functions.
ala_config(cache_directory = tempdir())
# List rodent species in the NT species <- ala_species(taxa = select_taxa("Rodentia"), filters = select_filters(stateProvince = "Northern Territory")) head(species)
ala_counts() provides summary counts on records in the ALA, without needing
to download all the records. In addition to the filter arguments, it has an
group_by argument, which provides counts binned by the requested
# Total number of records in the ALA ala_counts() # Total number of records, broken down by kindgom ala_counts(group_by = "kingdom")
In addition to text data describing individual occurrences and their attributes, ALA stores images, sounds and videos associated with a given record. These can be downloaded to
ala_media() and the same
set of filters as the other data download functions.
# Use the occurrences previously downloaded media_data <- ala_media( taxa = select_taxa("Eolophus roseicapilla"), filters = select_filters(year = 2020), download_dir = "media")
Various aspects of the galah package can be customized. To preserve
configuration for future sessions, set
profile_path to a location of a
To download occurrence records, you will need to provide an email address registered with the ALA. You can create an account here. Once an email is registered with the ALA, it should be stored in the config:
galah can cache most results to local files. This means that if the same code
is run multiple times, the second and subsequent iterations will be faster.
By default, this caching is session-based, meaning that the local files are stored in a temporary directory that is automatically deleted when the R session is ended. This behaviour can be altered so that caching is permanent, by setting the caching directory to a non-temporary location.
By default, caching is turned off. To turn caching on, run
If things aren't working as expected, more detail (particularly about web requests and caching behaviour) can be obtained by setting the
verbose configuration option:
ALA requires that you provide a reason when downloading occurrence data (via the galah
ala_occurrences() function). The reason is set as "scientific research" by default, but you can change this using
find_reasons() for valid download reasons.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.