galah
has been designed to support a piped workflow that mimics
workflows made popular by tidyverse packages such as dplyr
. Although piping in
galah
is optional, it can make things much easier to understand, and so we use
it in (nearly) all our examples.
To see what we mean, let's look at an example of how dplyr::filter()
works.
Notice how dplyr::filter
and galah_filter
both require logical arguments to
be added by using the ==
sign:
library(dplyr) mtcars |> filter(mpg == 21)
## mpg cyl disp hp drat wt qsec vs am gear carb ## Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4 ## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
galah_call() |> galah_filter(year == 2021) |> atlas_counts()
## # A tibble: 1 × 1 ## count ## <int> ## 1 8204330
As another example, notice how galah_group_by()
+ atlas_counts()
works very
similarly to dplyr::group_by()
+ dplyr::count()
:
mtcars |> group_by(vs) |> count()
## # A tibble: 2 × 2 ## # Groups: vs [2] ## vs n ## <dbl> <int> ## 1 0 18 ## 2 1 14
galah_call() |> galah_group_by(biome) |> atlas_counts()
## # A tibble: 2 × 2 ## biome count ## <chr> <int> ## 1 TERRESTRIAL 120083089 ## 2 MARINE 5189286
We made this move towards tidy evaluation to make it possible to use
piping for building queries to the Atlas of Living Australia. In practice, this
means that data queries can be filtered just like how you might
filter a data.frame
with the tidyverse
suite of functions.
galah_call()
You may have noticed in the above examples that dplyr
pipes begin with some
data, while galah
pipes all begin with galah_call()
(be sure to add the
parentheses!). This function tells galah
that you will be using
pipes to construct your query. Follow this with your preferred pipe (|>
from
base
or %>%
from magrittr
). You can then narrow your query line-by-line
using galah_
functions. Finally, end with an atlas_
function to identify
what type of data you want from your query.
Here is an example using counts of bandicoot records:
galah_call() |> galah_identify("perameles") |> galah_filter(year >= 2020) |> galah_group_by(species, year) |> atlas_counts()
## # A tibble: 15 × 3 ## species year count ## <chr> <chr> <int> ## 1 Perameles nasuta 2021 3337 ## 2 Perameles nasuta 2020 1573 ## 3 Perameles nasuta 2022 1515 ## 4 Perameles nasuta 2023 625 ## 5 Perameles gunnii 2023 95 ## 6 Perameles gunnii 2021 72 ## 7 Perameles gunnii 2022 64 ## 8 Perameles gunnii 2020 49 ## 9 Perameles bougainville 2021 84 ## 10 Perameles bougainville 2022 72 ## 11 Perameles bougainville 2020 1 ## 12 Perameles pallescens 2022 30 ## 13 Perameles pallescens 2021 25 ## 14 Perameles pallescens 2023 20 ## 15 Perameles pallescens 2020 11
And a second example, to download occurrence records of bandicoots in 2021, and also to include information on which records had zero coordinates:
galah_call() |> galah_identify("perameles") |> galah_filter(year == 2021) |> galah_select(group = "basic", ZERO_COORDINATE) |> atlas_occurrences() |> head()
## Retrying in 1 seconds.
## # A tibble: 6 × 8 ## recordID decimalLatitude decimalLongitude eventDate scientificName taxonConceptID dataResourceName occurrenceStatus ## <chr> <dbl> <dbl> <dttm> <chr> <chr> <chr> <chr> ## 1 00108221-afc6-4246-a… -28.8 153. 2021-09-29 00:00:00 Perameles nas… https://biodi… NSW BioNet Atlas PRESENT ## 2 001e914d-0281-41cb-b… -33.8 151. 2021-04-19 00:00:00 Perameles nas… https://biodi… NSW BioNet Atlas PRESENT ## 3 00233c1e-66df-4d9c-8… -33.8 151. 2021-02-27 00:00:00 Perameles nas… https://biodi… NSW BioNet Atlas PRESENT ## 4 003064b3-490a-49b5-a… -27.5 152. 2021-11-05 12:06:00 Perameles nas… https://biodi… iNaturalist Aus… PRESENT ## 5 004fd28b-a899-4a97-8… -33.8 151. 2021-07-24 00:00:00 Perameles nas… https://biodi… NSW BioNet Atlas PRESENT ## 6 0068547b-b091-4a86-8… -33.8 151. 2021-01-28 00:00:00 Perameles nas… https://biodi… NSW BioNet Atlas PRESENT
Note that the order in which galah_
functions are added doesn't matter, as long
as galah_call()
goes first, and an atlas_
function comes last.
dplyr
functions in galah
As of version 1.5.1, it is possible to call dplyr
functions natively within
galah
to amend how queries are processed, i.e.:
# galah syntax galah_call() |> galah_filter(year >= 2020) |> galah_group_by(year) |> atlas_counts()
## # A tibble: 4 × 2 ## year count ## <chr> <int> ## 1 2022 8353115 ## 2 2021 8204330 ## 3 2020 7116059 ## 4 2023 2997676
# dplyr syntax galah_call() |> filter(year >= 2020) |> group_by(year) |> count()
## Object of type `data_request` containing:
## • type occurrences-count ## • filter year >= 2020 ## • group_by year
The full list of masked functions is:
identify()
({graphics}
) as a synonym for galah_identify()
select()
({dplyr}
) as a synonym for galah_select()
group_by()
({dplyr}
) as a synonym for galah_group_by()
slice_head()
({dplyr}
) as a synonym for the limit
argument in atlas_counts()
st_crop()
({sf}
) as a synonym for galah_polygon()
count()
({dplyr}
) as a synonym for atlas_counts()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.