Biodiversity queries to the ALA usually require some spatial filtering. Let's see how spatial data are stored in the ALA and a few different methods to spatially filter data with galah and other packages.
Most records in the ALA contain location data in the form of two key fields:
decimalLatitude
and decimalLongitude
. As expected, these fields are the
decimal coordinates of each occurrence, with south and west values denoted by
negatives. While there may be some uncertainty in these values (see field
coordinateUncertaintyInMeters
), they are generally very accurate.
While these are very important and useful fields, very rarely will we encounter
situations that require queries directly calling decimalLatitude
and
decimalLongitude
. Instead, the ALA and galah have a number of features that
make spatial queries simpler.
Often we want to filter results down to some commonly defined spatial regions,
such as states, LGAs or IBRA/IMCRA regions. The ALA contains a large range
(>100) of contextual and spatial layers, in-built as searchable and queriable
fields. They are denoted by names beginning with "cl"
, followed by an
identifying number that may be up to 6 digits long. These fields are each based
on shapefiles, and contain the names of the regions in these layers that each
record lies in.
We strongly recommend using search_fields()
to check whether a contextual
layer already exists in the ALA that matches what you require before proceeding
with other methods of spatial filtering. These fields are all able to be queried
with filter()
and so they are generally easier to use.
Suppose we are interested in querying records of the Red-Necked Avocet (Recurvirostra novaehollandiae) in the Coorong wetlands in South Australia. We can search the ALA fields for wetlands.
library(galah) library(dplyr) library(gt) library(sf)
galah_config(email = "your_email_here", verbose = FALSE)
search_fields("wetlands")
## # A tibble: 2 × 3 ## id description type ## <chr> <chr> <chr> ## 1 cl901 Directory of Important Wetlands fields ## 2 cl11192 Ramsar_Wetlands_of_AustraliaWGS84 fields
Our search identifies that layer cl901
seems to match what we are looking for.
We can then either view all possible values in the field with show_values()
,
or search again for our particular field.
search_fields("cl901") |> search_values("coorong")
## • Showing values for 'cl901'.
## # A tibble: 1 × 1 ## cl901 ## <chr> ## 1 The Coorong, Lake Alexandrina & Lake Albert
We can filter all occurrences for exact matches with this value, "Lake Eyre"
.
Our galah
query can be built as follows:
galah_call() |> identify("Recurvirostra novaehollandiae") |> filter(cl901 == "The Coorong, Lake Alexandrina & Lake Albert") |> collect() |> head(5) |> gt::gt()
## Retrying in 1 seconds. ## Retrying in 2 seconds. ## Retrying in 4 seconds.
recordID | scientificName | taxonConceptID | decimalLatitude | decimalLongitude | eventDate | occurrenceStatus | dataResourceName |
---|---|---|---|---|---|---|---|
00630f58-ee78-4f69-ba39-718b0a1c3356 | Recurvirostra novaehollandiae | https://biodiversity.org.au/afd/taxa/c69e7308-527a-429d-a80d-143bd20b5100 | -35.59293 | 139.0218 | 2008-11-18 | PRESENT | SA Fauna |
00705077-abc2-4f28-aedd-11a479ec2d82 | Recurvirostra novaehollandiae | https://biodiversity.org.au/afd/taxa/c69e7308-527a-429d-a80d-143bd20b5100 | -35.52833 | 138.8089 | 2016-03-06 | PRESENT | BirdLife Australia, Birdata |
0071444e-ddf2-45dc-9a10-c6399686e306 | Recurvirostra novaehollandiae | https://biodiversity.org.au/afd/taxa/c69e7308-527a-429d-a80d-143bd20b5100 | -35.57719 | 138.9923 | 2008-01-01 | PRESENT | SA Fauna |
008a2e6c-26b4-4f4e-a428-b0330db53466 | Recurvirostra novaehollandiae | https://biodiversity.org.au/afd/taxa/c69e7308-527a-429d-a80d-143bd20b5100 | -35.52831 | 138.8280 | 2017-11-19 17:15:00 | PRESENT | eBird Australia |
00b11f13-0100-4471-b18b-7e9736ddeeaf | Recurvirostra novaehollandiae | https://biodiversity.org.au/afd/taxa/c69e7308-527a-429d-a80d-143bd20b5100 | -36.18001 | 139.6589 | 2017-08-05 13:25:00 | PRESENT | eBird Australia |
While server-side spatial information is useful, there are likely to be cases
where the shapefile or region you wish to query will not be pre-loaded as a
contextual layer in the ALA. In this case, shapefiles can be introduced to the
filtering process using the {sf} package and the geolocate()
function.
Shapefiles can be provided as an sf
object, whether that is by importing them
with sf::st_read()
or taking a POLYGON
or MULTIPOLYGON
character string and
transforming them with sf::st_as_sfc()
.
For instance, we might interested in species occurrences in King George Square,
Brisbane. We can take the MULTIPOLYGON
object for the square (as sourced from
the Brisbane City Council) and transform it into sfc
and then sf
objects.
king_george_sq <- "MULTIPOLYGON(((153.0243 -27.46886, 153.0242 -27.46896, 153.0236 -27.46837, 153.0239 -27.46814, 153.0239 -27.46813, 153.0242 -27.46789, 153.0244 -27.46805, 153.0245 -27.46821, 153.0246 -27.46828, 153.0247 -27.46835, 153.0248 -27.46848, 153.0246 -27.4686, 153.0246 -27.46862, 153.0245 -27.46871, 153.0243 -27.46886)))" |> sf::st_as_sfc() |> sf::st_as_sf()
We can provide this MULTIPOLYGON
in our filter as the argument of
geolocate()
to assess which species have been recorded in King George
Square.
galah_call() |> geolocate(king_george_sq) |> select(decimalLatitude, decimalLongitude, eventDate, scientificName, vernacularName) |> collect() |> head(10) |> gt::gt()
decimalLatitude | decimalLongitude | eventDate | scientificName | vernacularName |
---|---|---|---|---|
-27.46862 | 153.0243 | 2006-03-30 09:32:00 | Threskiornis moluccus | Australian White Ibis |
-27.46861 | 153.0244 | 2023-12-02 09:12:02 | Cosmophasis baehrae | NA |
-27.46855 | 153.0239 | 2024-02-03 20:45:00 | Mediastinia | NA |
-27.46851 | 153.0238 | 2023-04-08 17:49:00 | Entomyzon cyanotis | Blue-faced Honeyeater |
-27.46842 | 153.0241 | 2022-08-30 11:29:00 | Threskiornis moluccus | Australian White Ibis |
-27.46836 | 153.0241 | 2024-08-28 10:31:54 | Vitellus | NA |
-27.46833 | 153.0241 | 2022-07-24 14:43:00 | Threskiornis moluccus | Australian White Ibis |
-27.46831 | 153.0241 | 2024-10-13 14:34:00 | Threskiornis moluccus | Australian White Ibis |
-27.46830 | 153.0240 | 2024-10-13 14:35:00 | Gymnorhina tibicen | Australian Magpie |
-27.46817 | 153.0240 | 2024-07-23 10:00:53 | Threskiornis moluccus | Australian White Ibis |
There is a second argument of geolocate()
called type
, which defaults
to value "polygon"
. By setting the type
argument to "bbox"
, the provided
POLYGON
or MULTIPOLYGON
will be converted into the smallest bounding box
(rectangle) that contains the POLYGON
. In this case, records will be included
that may not exactly lie inside the provided shape.
galah_call() |> geolocate(king_george_sq, type = "bbox") |> select(decimalLatitude, decimalLongitude, eventDate, scientificName, vernacularName) |> collect() |> head(10) |> gt::gt()
## Data returned for bounding box: ## xmin = 153.0236 xmax = 153.0248 ymin = -27.46896 ymax = -27.46789
decimalLatitude | decimalLongitude | eventDate | scientificName | vernacularName |
---|---|---|---|---|
-27.46889 | 153.0244 | 2015-10-09 | Burhinus (Burhinus) grallarius | Bush Stone-curlew |
-27.46882 | 153.0236 | 2019-04-21 14:40:00 | Threskiornis moluccus | Australian White Ibis |
-27.46882 | 153.0236 | 2005-08-28 12:18:00 | Threskiornis moluccus | Australian White Ibis |
-27.46868 | 153.0238 | 2024-07-15 15:00:18 | Pseudanapaea denotata | NA |
-27.46868 | 153.0238 | 2024-01-13 15:07:42 | Tenodera australasiae | Purplewinged Mantid |
-27.46868 | 153.0238 | 2024-01-13 15:04:53 | Pristhesancus plagipennis | Bee-killer |
-27.46868 | 153.0238 | 2024-01-13 15:06:26 | Toxorhynchites (Toxorhynchites) speciosus | NA |
-27.46868 | 153.0238 | 2024-01-13 12:20:00 | Xixuthrus | NA |
-27.46868 | 153.0238 | 2024-07-15 12:12:27 | Columba (Columba) livia | Rock Dove |
-27.46868 | 153.0238 | 2024-01-13 15:07:42 | Lyramorpha (Lyramorpha) rosea | Litchi Stink Bug |
The type
argument with option "bbox"
is provided because sf
objects with
>500 vertices will not be accepted by the ALA. In the event you have a large
shapefile, using type = "bbox"
will at least enable an initial reduction of
the data that is downloaded, before finer filtering to the actual shapefile will
obtain the desired set of occurrences. Alternatively, one can also perform the
"bbox"
reduction before passing the shape to geolocate()
by using
sf::st_bbox()
.
A common situation for this to occur is when a shapefile with multiple shapes is provided, where we are interested in grouping our results by each shape. Here is a mock workflow using a subset of a shapefile of all 2,184 Brisbane parks.
Let's say we are interested in knowing which parks in the Brisbane postcode 4075 have the most occurrences of the Scaly-Breasted Lorikeet (Trichoglossus chlorolepidotus) since 2020. We can download the entire shapefile from the link above, and perform our filtering and summarising as follows:
brisbane_parks <- sf::st_read("path/to/Park___Locations.shp") |> sf::st_make_valid() |> filter(POST_CODE == 4075)
# Convert shapefile to a bounding box brisbane_parks_bbox <- brisbane_parks |> sf::st_bbox() # Find all occurrences of Trichoglossus chlorolepidotus in the bounding box in 2022 lorikeet_brisbane <- galah_call() |> filter(scientificName == "Trichoglossus chlorolepidotus", year >= 2020) |> geolocate(brisbane_parks_bbox, type = "bbox") |> collect()
## Data returned for bounding box: ## xmin = 152.96331 xmax = 152.99668 ymin = -27.57737 ymax = -27.51606 ## Retrying in 1 seconds.
# Filter records down to only those in the shapefile polygons lorikeet_brisbane |> # Create a point geometry based on the occurrence coordinates sf::st_as_sf(coords = c("decimalLongitude", "decimalLatitude"), crs = sf::st_crs(brisbane_parks), remove = FALSE) |> # identify which park each occurrence sits in with st_intersects() mutate(intersection = sf::st_intersects(geometry, brisbane_parks) |> as.integer(), park = ifelse(is.na(intersection), NA, brisbane_parks$PARK_NAME[intersection])) |> # Filter out occurrences that did not occur in a park filter(!is.na(park)) |> # Drop the geometry column sf::st_drop_geometry() |> # Summarise the top 10 parks for lorikeet sightings in 2022 group_by(park) |> summarise(counts = n()) |> arrange(desc(counts)) |> head(10)
## # A tibble: 10 × 2 ## park counts ## <chr> <int> ## 1 NYUNDARE-BA PARK 552 ## 2 SHERWOOD ARBORETUM 472 ## 3 FAULKNER PARK 64 ## 4 FORT ROAD BUSHLAND 46 ## 5 STRICKLAND TERRACE PARK 44 ## 6 BENARRAWA RESERVE 34 ## 7 GRACEVILLE RIVERSIDE PARKLANDS 32 ## 8 NOSWORTHY PARK 17 ## 9 HORACE WINDOW RESERVE 14 ## 10 GRACEVILLE AVENUE PARK 7
Some shapefiles cover large geographic areas with the caveat that even the bounding box doesn't restrict the number of records to a value that can be downloaded easily. In this case, we recommend more nuances and detailed methods that can be performed using looping techniques. One of our ALA Labs blog posts, Hex maps for species occurrence data, has been written detailing how to approach larger problems such as this.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.