Each occurrence record contains taxonomic information and
information about the observation itself, like its location and the date
of observation. These pieces of information are recorded and categorised into
respective fields. When you import data using galah, columns of the
resulting tibble
correspond to these fields.
Data fields are important because they provide a means to narrow and refine
queries to return only the information that you need, and no more. Consequently,
much of the architecture of galah has been designed to make narrowing as simple
as possible. For legacy reasons, there are both dplyr
-style verbs and galah-
specific versions of these functions; but they are largely synonymous. They
include:
identify()
or galah_identify()
filter()
or galah_filter()
select()
or galah_select()
group_by()
or galah_group_by()
geolocate()
or galah_geolocate()
Below we discuss each of these functions in turn.
search_taxa
& identify
Perhaps unsurprisingly, search_taxa()
searches for taxonomic information.
search_taxa()
uses fuzzy-matching to work a lot like the search bar on the
Atlas of Living Australia website,
and you can use it to search for taxa by their scientific name.
Finding your desired taxon with search_taxa()
is an important step to using
this taxonomic information to download data. For example, to search for
reptiles, we first need to identify whether we have the correct query:
search_taxa("Reptilia")
## # A tibble: 1 × 9 ## search_term scientific_name taxon_concept_id rank match_type kingdom phylum class issues ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 Reptilia REPTILIA https://biodiversity.org.au/afd/taxa/682e1228… class exactMatch Animal… Chord… Rept… noIss…
If we want to be more specific, we can provide a tibble
(or data.frame
)
providing additional taxonomic information.
search_taxa(tibble(genus = "Eolophus", kingdom = "Aves"))
## # A tibble: 1 × 13 ## search_term scientific_name scientific_name_auth…¹ taxon_concept_id rank match_type kingdom phylum class order family ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 Eolophus_Av… Eolophus Bonaparte, 1854 https://biodive… genus exactMatch Animal… Chord… Aves Psit… Cacat… ## # ℹ abbreviated name: ¹scientific_name_authorship ## # ℹ 2 more variables: genus <chr>, issues <chr>
Once we know that our search matches the correct taxon or taxa, we
can use identify()
to narrow the results of our query.
galah_call() |> identify("Reptilia") |> atlas_counts()
## # A tibble: 1 × 1 ## count ## <int> ## 1 1841182
If you're using an international atlas, search_taxa()
will automatically
switch to using the local name-matching service. For example, Portugal uses the
GBIF taxonomic backbone, but integrates seamlessly with our standard workflow.
galah_config(atlas = "Portugal")
## Atlas selected: GBIF Portugal (GBIF.pt) [Portugal]
galah_call() |> identify("Lepus") |> group_by(species) |> atlas_counts()
## # A tibble: 5 × 2 ## species count ## <chr> <int> ## 1 Lepus granatensis 1378 ## 2 Lepus microtis 64 ## 3 Lepus europaeus 10 ## 4 Lepus saxatilis 2 ## 5 Lepus capensis 1
Conversely, the UK's National Biodiversity Network (NBN), has its own taxonomic backbone, but is supported using the same function call.
galah_config(atlas = "United Kingdom")
## Atlas selected: National Biodiversity Network (NBN) [United Kingdom]
galah_call() |> filter(genus == "Bufo") |> group_by(species) |> atlas_counts()
## # A tibble: 3 × 2 ## species count ## <chr> <int> ## 1 Bufo bufo 77009 ## 2 Bufo spinosus 143 ## 3 Bufo marinus 1
Perhaps the most important function in galah is filter()
, which is used
to filter the rows of queries.
galah_config(atlas = "Australia")
## Atlas selected: Atlas of Living Australia (ALA) [Australia]
# Get total record count since 2000 galah_call() |> filter(year > 2000) |> atlas_counts()
## # A tibble: 1 × 1 ## count ## <int> ## 1 104768572
# Get total record count for iNaturalist in 2021 galah_call() |> filter( year > 2000, dataResourceName == "iNaturalist Australia") |> atlas_counts()
## # A tibble: 1 × 1 ## count ## <int> ## 1 8085678
To find available fields and corresponding valid values, use the field lookup
functions show_all(fields)
, search_all(fields)
& show_values()
.
galah_filter()
can also be used to make more complex taxonomic
queries than are possible using search_taxa()
. By using the taxonConceptID
field, it is possible to build queries that exclude certain taxa, for example.
This can be useful to filter for paraphyletic concepts such as invertebrates.
galah_call() |> filter( taxonConceptID == search_taxa("Animalia")$taxon_concept_id, taxonConceptID != search_taxa("Chordata")$taxon_concept_id ) |> group_by(class) |> atlas_counts()
## # A tibble: 70 × 2 ## class count ## <chr> <int> ## 1 Insecta 6636702 ## 2 Gastropoda 1079236 ## 3 Arachnida 880799 ## 4 Maxillopoda 701466 ## 5 Malacostraca 667094 ## 6 Polychaeta 278997 ## 7 Bivalvia 238787 ## 8 Anthozoa 228733 ## 9 Cephalopoda 150198 ## 10 Demospongiae 119207 ## # ℹ 60 more rows
In addition to single filters, some atlases (currently Australia, Sweden &
Spain) also support 'data profiles'. These are effectively pre-formed sets of
filters that are designed to remove records that are suspect in some way. This
feature has its' own function, apply_profile()
:
galah_call() |> filter(year > 2000) |> apply_profile(ALA) |> atlas_counts()
## # A tibble: 1 × 1 ## count ## <int> ## 1 91982400
To see a full list of data profiles, use show_all(profiles)
.
Use group_by()
to group and summarise record counts by specified fields.
# Get record counts since 2010, grouped by year and basis of record galah_call() |> filter(year > 2015 & year <= 2020) |> group_by(year, basisOfRecord) |> atlas_counts()
## # A tibble: 35 × 3 ## year basisOfRecord count ## <chr> <chr> <int> ## 1 2020 HUMAN_OBSERVATION 6859463 ## 2 2020 OCCURRENCE 188090 ## 3 2020 PRESERVED_SPECIMEN 87730 ## 4 2020 MACHINE_OBSERVATION 39642 ## 5 2020 OBSERVATION 4417 ## 6 2020 MATERIAL_SAMPLE 2104 ## 7 2020 LIVING_SPECIMEN 62 ## 8 2019 HUMAN_OBSERVATION 6104069 ## 9 2019 PRESERVED_SPECIMEN 166446 ## 10 2019 OCCURRENCE 93853 ## # ℹ 25 more rows
Use select()
to choose which columns are returned when downloading records.
Return columns 'kingdom', 'eventDate' & `species` only occurrences <- galah_call() |> identify("reptilia") |> filter(year == 1930) |> select(kingdom, species, eventDate) |> atlas_occurrences() occurrences |> head()
## # A tibble: 6 × 3 ## kingdom species eventDate ## <chr> <chr> <dttm> ## 1 Animalia Drysdalia coronoides 1930-06-16 00:00:00 ## 2 Animalia Antaresia maculosa 1930-01-01 00:00:00 ## 3 Animalia NA 1930-04-23 00:00:00 ## 4 Animalia Stegonotus australis 1930-01-01 00:00:00 ## 5 Animalia Oxyuranus scutellatus 1930-01-01 00:00:00 ## 6 Animalia Lerista wilkinsi 1930-01-01 00:00:00
You can also use other {dplyr}
functions that work within dplyr::select()
.
occurrences <- galah_call() |> identify("reptilia") |> filter(year == 1930) |> select(starts_with("accepted") | ends_with("record")) |> atlas_occurrences()
## Retrying in 1 seconds.
occurrences |> head()
## # A tibble: 6 × 6 ## acceptedNameUsage acceptedNameUsageID basisOfRecord raw_basisOfRecord OCCURRENCE_STATUS_INFE…¹ userDuplicateRecord ## <chr> <lgl> <chr> <chr> <lgl> <lgl> ## 1 <NA> NA HUMAN_OBSERVATION HumanObservation FALSE FALSE ## 2 <NA> NA PRESERVED_SPECIMEN PreservedSpecimen FALSE FALSE ## 3 <NA> NA PRESERVED_SPECIMEN PreservedSpecimen FALSE FALSE ## 4 <NA> NA HUMAN_OBSERVATION HumanObservation FALSE FALSE ## 5 <NA> NA PRESERVED_SPECIMEN PreservedSpecimen FALSE FALSE ## 6 <NA> NA PRESERVED_SPECIMEN PreservedSpecimen FALSE FALSE ## # ℹ abbreviated name: ¹OCCURRENCE_STATUS_INFERRED_FROM_BASIS_OF_RECORD
Use geolocate()
to specify a geographic area or region to limit your search.
# Get list of perameles species in area specified: # (Note: This can also be specified by a shapefile) wkt <- "POLYGON((131.36328125 -22.506468769126,135.23046875 -23.396716654542,134.17578125 -27.287832521411,127.40820312499 -26.661206402316,128.111328125 -21.037340349154,131.36328125 -22.506468769126))" galah_call() |> identify("perameles") |> geolocate(wkt) |> atlas_species()
## # A tibble: 1 × 11 ## taxon_concept_id species_name scientific_name_auth…¹ taxon_rank kingdom phylum class order family genus vernacular_name ## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 https://biodive… Perameles e… Spencer, 1897 species Animal… Chord… Mamm… Pera… Peram… Pera… Desert Bandico… ## # ℹ abbreviated name: ¹scientific_name_authorship
geolocate()
also accepts shapefiles. More complex shapefiles may need to
be simplified first (e.g., using rmapshaper::ms_simplify()
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.