The default method for building queries in galah
is to first use galah_call()
to create a query object called a "data_request
". This object class is specific
to galah
.
galah_call() |> filter(genus == "Crinia") |> class()
## [1] "data_request"
When a piped object is of class data_request
, galah can trigger functions to
use specific methods for this object class, even if a function name is used by
another package. For example, users can use filter()
and group_by()
functions
from dplyr instead
of galah_filter()
and galah_group_by()
to construct a query. Consequently,
the following queries are synonymous:
galah_call() |> galah_filter(genus == "Crinia", year == 2020) |> galah_group_by(species) |> atlas_counts()
galah_call() |> filter(genus == "Crinia", year == 2020) |> group_by(species) |> atlas_counts()
## # A tibble: 16 × 2 ## species count ## <chr> <int> ## 1 Crinia signifera 42621 ## 2 Crinia parinsignifera 8664 ## 3 Crinia glauerti 3111 ## 4 Crinia georgiana 1509 ## 5 Crinia remota 718 ## 6 Crinia sloanei 682 ## 7 Crinia insignifera 530 ## 8 Crinia tinnula 291 ## 9 Crinia deserticola 253 ## 10 Crinia pseudinsignifera 223 ## 11 Crinia tasmaniensis 181 ## 12 Crinia bilingua 74 ## 13 Crinia subinsignifera 46 ## 14 Crinia riparia 10 ## 15 Crinia flindersensis 3 ## 16 Crinia nimba 1
Thanks to object-oriented programming, galah "masks" filter()
and group_by()
functions to use methods defined for data_request
objects instead. The full
list of masked functions is:
arrange()
({dplyr}
)count()
({dplyr}
)identify()
({graphics}
) as a synonym for galah_identify()
select()
({dplyr}
) as a synonym for galah_select()
group_by()
({dplyr}
) as a synonym for galah_group_by()
slice_head()
({dplyr}
) as a synonym for the limit
argument in atlas_counts()
st_crop()
({sf}
) as a synonym for galah_polygon()
Note that these functions are all evaluated lazily; they amend the underlying
object, but do not amend the nature of the data until the call is evaluated. To
actually build and run the query, we'll need to use one or more of a different
set of dplyr verbs: collapse()
, compute()
and collect()
.
The usual way to begin a query to request data in galah is using galah_call()
.
However, this function now calls one of three types of request_
functions.
If you prefer, you can begin your pipe with one of these dedicated request_
functions (rather than galah_call()
) depending on the type of data you
want to collect.
For example, if you want to download occurrences, use request_data()
:
x <- request_data("occurrences") |> # note that "occurrences" is the default `type` filter(species == "Crinia tinnula", year == 2010) |> collect()
You'll notice that this query differs slightly from the query structure used in
earlier versions of galah
. The desired data type, "occurrences"
,
is specified at the beginning of the query within request_data()
rather than
at the end using atlas_occurrences()
. Specifying the data type at the start
allows users to make use of advanced query building using three newly
implemented stages of query building: collapse()
, compute()
and collect()
.
These stages mirror existing functions in dplyr for querying
databases, and act in the
following way:
collapse()
converts the object to a query
. This allows users to inspectcompute()
is intended to send the query in question to the requested API
for processing. This is particularly important for occurrences, where
it can be useful to submit a query and retrieve it at a later time. If the
compute()
stage is not required, however, compute()
simply converts
the query
to a new class (computed_query
).collect()
retrieves the requested data into your workspace, returning a
tibble
.We can use these in sequence, or just leap ahead to the stage we want:
x <- request_data() |> filter(genus == "Crinia", year == 2020) |> group_by(species) |> arrange(species) |> count() collapse(x)
## Object of class query with type data/occurrences-count-groupby ## url: https://api.ala.org.au/occurrences/occurrences/facets?fq=%28genus%3A%2... ## arrange: species (ascending)
compute(x)
## Object of class computed_query with type data/occurrences-count-groupby ## url: https://api.ala.org.au/occurrences/occurrences/facets?fq=%28genus%3A%2... ## arrange: species (ascending)
collect(x) |> head()
## # A tibble: 6 × 2 ## species count ## <chr> <int> ## 1 Crinia bilingua 74 ## 2 Crinia deserticola 253 ## 3 Crinia flindersensis 3 ## 4 Crinia georgiana 1509 ## 5 Crinia glauerti 3111 ## 6 Crinia insignifera 530
The benefit of using collapse()
, compute()
and collect()
is that queries
are more modular. This is particularly useful for large data requests in galah.
Users can send their query using compute()
, and download data once the query
has finished — downloading with collect()
later — rather than waiting for the
request to finish within R.
# Create and send query to be calculated server-side request <- request_data() |> identify("perameles") |> filter(year > 1900) |> compute() # Download data request |> collect()
Additionally, functions that are more modular are generally easier to
interrogate and debug. Previously some functions did several different things,
making it difficult to know which APIs were being called, when, and for what
purpose. Partitioning queries into three distinct stages is much more transparent,
and allows users to check their query construction prior to sending a request.
For example, the query above is constructed with the following information,
returned by collapse()
.
request_data() |> identify("perameles") |> filter(year > 1900) |> collapse()
## Object of class query with type data/occurrences ## url: https://api.ala.org.au/occurrences/occurrences/offline/download?fq=%28...
The collapse()
stage includes an additional argument (.expand
) that,
when set to TRUE
, shows all the APIs called to construct the user-requested
query. This is especially useful for debugging.
Under the hood, the different query-building verbs each amend the supplied object to a new class:
collapse()
returns class query
, which is a list containing a type
slot
and one or more url
scompute()
returns a single object of class computed_query
collect()
returns a tibble
These can be called directly, or via the method
and type
arguments of
galah_call()
, which specify which dedicated request_
function and data type
to return. To demonstrate what we mean, take the following calls, which despite
using different syntax, all return the number of records available for the year
2020:
# new syntax request_data() |> filter(year == 2020) |> count() |> collect() # similar, but using `galah_call()` galah_call(method = "data", type = "occurrences-count") |> filter(year == 2020) |> collect() # original syntax galah_call() |> galah_filter(year == 2020) |> atlas_counts()
Another example is to list available fields
in the selected atlas:
request_metadata(type = "fields") |> collect() galah_call(method = "metadata", type = "fields") |> collect() show_all(fields)
Or to show values for states and territories:
request_metadata() |> filter(field == "cl22") |> unnest() |> collect() galah_call(method = "metadata", type = "fields-unnest") |> galah_filter(id == "cl22") |> collect() search_all(fields, "cl22") |> show_values()
While request_metadata()
is more modular than show_all()
, there is
little benefit to using it for most applications. However, in some cases,
larger databases like GBIF return huge data.frame
s of metadata when called
via show_all()
. Using request_metdata()
allows users to specify a
slice_head()
line within their pipe to get around this issue.
Despite these benefits, we have no plans to require users to call masked
functions. Functions prefixed with galah_
or atlas_
are not going away.
Indeed, while there is perfect redundancy between old and new syntax in some
cases, in others they serve different purposes. In atlas_media()
for example,
several calls are made and joined in a way that reduces the number of steps
required by the user. Under the hood, however, all atlas_
functions are now
entirely built using the above syntax.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.