galah is an R interface to biodiversity data hosted by the Global Biodiversity Information Facility (GBIF) and its subsidiary node organisations. GBIF and its partner nodes collate and store observations of individual life forms using the 'Darwin Core' data standard.
To install from CRAN:
install.packages("galah")
Or install the development version from GitHub:
install.packages("remotes") remotes::install_github("AtlasOfLivingAustralia/galah")
Load the package
library(galah)
By default, galah downloads information from the Atlas of Living Australia
(ALA). To show the full list of organisations currently supported by galah,
use show_all(atlases)
.
show_all(atlases)
## # A tibble: 10 × 4 ## region institution acronym url ## <chr> <chr> <chr> <chr> ## 1 Australia Atlas of Living Australia ALA https://www.ala.org.au ## 2 Austria Biodiversitäts-Atlas Österreich BAO https://biodiversityat… ## 3 Brazil Sistemas de Informações sobre a Biodiversidade Brasileira SiBBr https://sibbr.gov.br ## 4 France Portail français d'accès aux données d'observation sur les espèces OpenObs https://openobs.mnhn.fr ## 5 Global Global Biodiversity Information Facility GBIF https://gbif.org ## 6 Guatemala Sistema Nacional de Información sobre Diversidad Biológica de Guatemala SNIBgt https://snib.conap.gob… ## 7 Portugal GBIF Portugal GBIF.pt https://www.gbif.pt ## 8 Spain GBIF Spain GBIF.es https://gbif.es ## 9 Sweden Swedish Biodiversity Data Infrastructure SBDI https://biodiversityda… ## 10 United Kingdom National Biodiversity Network NBN https://nbn.org.uk
Use galah_config()
to set the node organisation using its region, name, or
acronym. Once set, galah
will automatically populate the server configuration for your
selected GBIF node. To download occurrence records from your chosen
GBIF node, you will need to register an account with them (using their website),
then provide your registration email to galah.
To download from GBIF, you will need to provide the email, username, and
password.
galah_config(atlas = "GBIF", username = "user1", email = "email@email.com", password = "my_password")
You can find a full list of configuration options by running ?galah_config
.
The standard method to construct queries in {galah}
is via piped functions.
Pipes in galah
start with the galah_call()
function, and typically end with
collect()
, though collapse()
and compute()
are also supported. The
development team use the base pipe by default (|>
), but the {magrittr}
pipe
(%>%
) should work too.
galah_config(atlas = "ALA", verbose = FALSE) galah_call() |> count() |> collect()
## # A tibble: 1 × 1 ## count ## <int> ## 1 146185520
To pass more complex queries, you can use additional {dplyr}
functions such as
filter()
, select()
, and group_by()
.
galah_call() |> filter(year >= 2020) |> count() |> collect()
## # A tibble: 1 × 1 ## count ## <int> ## 1 40200358
Each GBIF node allows you to query using their own set of in-built fields. You
can investigate which fields are available using show_all()
and search_all()
:
search_all(fields, "australian states")
## # A tibble: 2 × 3 ## id description type ## <chr> <chr> <chr> ## 1 cl2013 ASGS Australian States and Territories fields ## 2 cl22 Australian States and Territories fields
To narrow your search to a particular taxonomic group, use identify()
. Note
that this function only accepts scientific names and is not case sensitive.
It's good practice to first use search_taxa()
to check that the taxa you
provide returns the correct taxonomic results.
search_taxa("reptilia") # Check whether taxonomic info is correct
## # A tibble: 1 × 9 ## search_term scientific_name taxon_concept_id rank match_type kingdom phylum class issues ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 reptilia REPTILIA https://biodiversity.org.au/afd/taxa/682e1228… class exactMatch Animal… Chord… Rept… noIss…
galah_call() |> identify("reptilia") |> filter(year >= 2020) |> count() |> collect()
## # A tibble: 1 × 1 ## count ## <int> ## 1 338434
If you want to query something other than the number of records, modify the
type
argument in galah_call()
. Here we'll query the number of species:
galah_call(type = "species") |> identify("reptilia") |> filter(year >= 2020) |> count() |> collect()
## # A tibble: 1 × 1 ## count ## <int> ## 1 883
To download records---rather than find how many records are available---simply
remove the count()
function from your pipe.
result <- galah_call() |> identify("Litoria") |> filter(year >= 2020, cl22 == "Tasmania") |> select(basisOfRecord, group = "basic") |> collect()
## Retrying in 1 seconds.
result |> head()
## # A tibble: 6 × 9 ## recordID scientificName taxonConceptID decimalLatitude decimalLongitude eventDate occurrenceStatus ## <chr> <chr> <chr> <dbl> <dbl> <dttm> <chr> ## 1 00052544-d943-42e9… Litoria ewing… https://biodi… -42.9 147. 2022-09-19 00:00:00 PRESENT ## 2 00168ca6-84d0-4af1… Litoria ranif… https://biodi… -41.2 146. 2023-12-21 10:20:19 PRESENT ## 3 001a43fe-8586-4064… Litoria ewing… https://biodi… -43.0 147. 2021-08-07 00:00:00 PRESENT ## 4 00250163-ec50-4eda… Litoria ranif… https://biodi… -41.2 147. 2023-08-23 11:49:28 PRESENT ## 5 003e0f63-9f95-4af9… Litoria ewing… https://biodi… -42.9 148. 2022-12-24 06:27:00 PRESENT ## 6 0070521f-bb45-46fb… Litoria ewing… https://biodi… -43.1 147. 2023-12-20 14:29:23 PRESENT ## # ℹ 2 more variables: dataResourceName <chr>, basisOfRecord <chr>
Check out our other vignettes for more detail on how to use these functions.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.