Biodiversity queries to the ALA usually require some spatial filtering. Let's see how spatial data are stored in the ALA and a few different methods to spatially filter data with galah and other packages.
Most records in the ALA contain location data in the form of two key fields: decimalLatitude
and decimalLongitude
. As expected, these fields are the decimal coordinates of each occurrence, with south and west values denoted by negatives. While there may be some uncertainty in these values (see field coordinateUncertaintyInMeters
), they are generally very accurate.
While these are very important and useful fields, very rarely will we encounter situations that require queries directly calling decimalLatitude
and decimalLongitude
. Instead, the ALA and galah have a number of features that make spatial queries simpler.
Often we want to filter results down to some commonly defined spatial regions, such as states, LGAs or IBRA/IMCRA regions. The ALA contains a large range (>100) of contextual and spatial layers, in-built as searchable and queriable fields. They are denoted by names beginning with "cl"
, followed by an identifying number that may be up to 6 digits long. These fields are each based on shapefiles, and contain the names of the regions in these layers that each record lies in.
We strongly recommend using search_fields()
to check whether a contextual layer already exists in the ALA that matches what you require before proceeding with other methods of spatial filtering. These fields are all able to be queried with galah_filter()
and so they are generally easier to use.
Suppose we are interested in querying records of the Red-Necked Avocet (Recurvirostra novaehollandiae) in a particular protected wetlands, the Coorong wetlands in South Australia. We can search the ALA fields for wetlands.
library(galah) library(tidyverse) library(gt) library(sf)
galah_config(email = "your_email_here", verbose = FALSE)
search_fields("wetlands")
## # A tibble: 1 × 3 ## id description type ## <chr> <chr> <chr> ## 1 cl901 Directory of Important Wetlands fields
Our search identifies that layer cl901
seems to match what we are looking for. We can then either view all possible values in the field with show_values()
, or search again for our particular field.
search_fields("cl901") |> search_values("coorong")
## • Showing values for 'cl901'.
## # A tibble: 1 × 1 ## cl901 ## <chr> ## 1 The Coorong, Lake Alexandrina & Lake Albert
We can filter all occurrences for exact matches with this value, "Lake Eyre"
. Our galah query can be built as follows:
galah_call() |> galah_identify("Recurvirostra novaehollandiae") |> galah_filter(cl901 == "The Coorong, Lake Alexandrina & Lake Albert") |> atlas_occurrences() |> head(5) |> gt::gt()
## Retrying in 1 seconds. ## Retrying in 2 seconds. ## Retrying in 4 seconds.
recordID | scientificName | taxonConceptID | decimalLatitude | decimalLongitude | eventDate | occurrenceStatus | dataResourceName |
---|---|---|---|---|---|---|---|
00630f58-ee78-4f69-ba39-718b0a1c3356 | Recurvirostra novaehollandiae | https://biodiversity.org.au/afd/taxa/c69e7308-527a-429d-a80d-143bd20b5100 | -35.59293 | 139.0218 | 2008-11-18 00:00:00 | PRESENT | SA Fauna |
00705077-abc2-4f28-aedd-11a479ec2d82 | Recurvirostra novaehollandiae | https://biodiversity.org.au/afd/taxa/c69e7308-527a-429d-a80d-143bd20b5100 | -35.52833 | 138.8089 | 2016-03-06 00:00:00 | PRESENT | BirdLife Australia, Birdata |
0071444e-ddf2-45dc-9a10-c6399686e306 | Recurvirostra novaehollandiae | https://biodiversity.org.au/afd/taxa/c69e7308-527a-429d-a80d-143bd20b5100 | -35.57719 | 138.9923 | 2008-01-01 00:00:00 | PRESENT | SA Fauna |
008a2e6c-26b4-4f4e-a428-b0330db53466 | Recurvirostra novaehollandiae | https://biodiversity.org.au/afd/taxa/c69e7308-527a-429d-a80d-143bd20b5100 | -35.52831 | 138.8280 | 2017-11-19 17:15:00 | PRESENT | eBird Australia |
00b11f13-0100-4471-b18b-7e9736ddeeaf | Recurvirostra novaehollandiae | https://biodiversity.org.au/afd/taxa/c69e7308-527a-429d-a80d-143bd20b5100 | -36.18001 | 139.6589 | 2017-08-05 13:25:00 | PRESENT | eBird Australia |
While server-side spatial information is useful, there are likely to be cases where the shapefile or region you wish to query will not be pre-loaded as a contextual layer in the ALA. In this case, shapefiles can be introduced to the filtering process using the {sf} package and the galah_geolocate()
function. Shapefiles can be provided as an sf
object, whether that is by importing them with sf::st_read()
or taking a POLYGON
or MULTIPOLYGON
character string and transforming them with sf::st_as_sfc()
.
For instance, we might interested in species occurrences in King George Square, Brisbane. We can take the MULTIPOLYGON
object for the square (as sourced from the Brisbane City Council) and transform it into sfc
and then sf
objects.
king_george_sq <- "MULTIPOLYGON(((153.0243 -27.46886, 153.0242 -27.46896, 153.0236 -27.46837, 153.0239 -27.46814, 153.0239 -27.46813, 153.0242 -27.46789, 153.0244 -27.46805, 153.0245 -27.46821, 153.0246 -27.46828, 153.0247 -27.46835, 153.0248 -27.46848, 153.0246 -27.4686, 153.0246 -27.46862, 153.0245 -27.46871, 153.0243 -27.46886)))" |> sf::st_as_sfc() |> sf::st_as_sf()
We can provide this MULTIPOLYGON
in our filter as the argument of galah_geolocate()
to assess which species have been recorded in King George Square.
galah_call() |> galah_filter() |> galah_geolocate(king_george_sq) |> galah_select(decimalLatitude, decimalLongitude, eventDate, scientificName, vernacularName) |> atlas_occurrences() |> head(10) |> gt::gt()
decimalLatitude | decimalLongitude | eventDate | scientificName | vernacularName |
---|---|---|---|---|
-27.46851 | 153.0238 | 2023-04-08 07:49:00 | Entomyzon cyanotis | Blue-faced Honeyeater |
-27.46842 | 153.0241 | 2022-08-30 01:29:00 | Threskiornis moluccus | Black-necked Ibis |
-27.46833 | 153.0241 | 2022-07-24 14:43:00 | Threskiornis moluccus | Black-necked Ibis |
There is a second argument of galah_geolocate()
called type
, which defaults to value "polygon"
. By setting the type
argument to "bbox"
, the provided POLYGON
or MULTIPOLYGON
will be converted into the smallest bounding box (rectangle) that contains the POLYGON
. In this case, records will be included that may not exactly lie inside the provided shape.
galah_call() |> galah_filter() |> galah_geolocate(king_george_sq, type = "bbox") |> galah_select(decimalLatitude, decimalLongitude, eventDate, scientificName, vernacularName) |> atlas_occurrences() |> head(10) |> gt::gt()
## Data returned for bounding box: ## xmin = 153.0236 xmax = 153.0248 ymin = -27.46896 ymax = -27.46789 ## Retrying in 1 seconds.
decimalLatitude | decimalLongitude | eventDate | scientificName | vernacularName |
---|---|---|---|---|
-27.46882 | 153.0236 | 2019-04-21 04:40:00 | Threskiornis moluccus | Black-necked Ibis |
-27.46882 | 153.0236 | 2005-08-28 12:18:00 | Threskiornis moluccus | Black-necked Ibis |
-27.46851 | 153.0238 | 2023-04-08 07:49:00 | Entomyzon cyanotis | Blue-faced Honeyeater |
-27.46842 | 153.0241 | 2022-08-30 01:29:00 | Threskiornis moluccus | Black-necked Ibis |
-27.46833 | 153.0241 | 2022-07-24 14:43:00 | Threskiornis moluccus | Black-necked Ibis |
-27.46806 | 153.0239 | NA | Syntrichia laevipila | NA |
The type
argument with option "bbox"
is provided because sf
objects with >500 vertices will not be accepted by the ALA. In the event you have a large shapefile, using type = "bbox"
will at least enable an initial reduction of the data that is downloaded, before finer filtering to the actual shapefile will obtain the desired set of occurrences. Alternatively, one can also perform the "bbox"
reduction before passing the shape to galah_geolocate()
by using sf::st_bbox()
.
A common situation for this to occur is when a shapefile with multiple shapes is provided, where we are interested in grouping our results by each shape. Here is a mock workflow using a subset of a shapefile of all 2,184 Brisbane parks.
Let's say we are interested in knowing which parks in the Brisbane postcode 4075 have the most occurrences of the Scaly-Breasted Lorikeet, Trichoglossus chlorolepidotus, since 2020. We can download the entire shapefile from the above link, and perform our filtering and summarising as follows:
brisbane_parks <- sf::st_read("path/to/Park___Locations.shp") |> sf::st_make_valid() |> filter(POST_CODE == 4075)
# Convert shapefile to a bounding box brisbane_parks_bbox <- brisbane_parks |> sf::st_bbox() # Find all occurrences of Trichoglossus chlorolepidotus in the bounding box in 2022 lorikeet_brisbane <- galah_call() |> galah_filter(scientificName == "Trichoglossus chlorolepidotus", year >= 2020) |> galah_geolocate(brisbane_parks_bbox, type = "bbox") |> atlas_occurrences()
## Data returned for bounding box: ## xmin = 152.96331 xmax = 152.99668 ymin = -27.57737 ymax = -27.51606 ## Retrying in 1 seconds.
# Filter records down to only those in the shapefile polygons lorikeet_brisbane |> # Create a point geometry based on the occurrence coordinates sf::st_as_sf(coords = c("decimalLongitude", "decimalLatitude"), crs = sf::st_crs(brisbane_parks), remove = FALSE) |> # identify which park each occurrence sits in with st_intersects() mutate(intersection = sf::st_intersects(geometry, brisbane_parks) |> as.integer(), park = ifelse(is.na(intersection), NA, brisbane_parks$PARK_NAME[intersection])) |> # Filter out occurrences that did not occur in a park filter(!is.na(park)) |> # Drop the geometry column sf::st_drop_geometry() |> # Summarise the top 10 parks for lorikeet sightings in 2022 group_by(park) |> summarise(counts = n()) |> arrange(desc(counts)) |> head(10)
## # A tibble: 10 × 2 ## park counts ## <chr> <int> ## 1 SHERWOOD ARBORETUM 276 ## 2 NYUNDARE-BA PARK 271 ## 3 FAULKNER PARK 51 ## 4 STRICKLAND TERRACE PARK 33 ## 5 FORT ROAD BUSHLAND 31 ## 6 GRACEVILLE RIVERSIDE PARKLANDS 31 ## 7 BENARRAWA RESERVE 19 ## 8 NOSWORTHY PARK 16 ## 9 HORACE WINDOW RESERVE 9 ## 10 GRACEVILLE AVENUE PARK 6
Some shapefiles cover large geographic areas with the caveat that even the bounding box doesn't restrict the number of records to a value that can be downloaded easily. In this case, we recommend more nuances and detailed methods that can be performed using looping techniques. One of our ALA Labs blog posts, Hex maps for species occurrence data, has been written detailing how to approach larger problems such as this.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.