| filter.data_request | R Documentation |
The filter() function is used to subset a data, retaining all rows that
satisfy your conditions. To be retained, the row must produce a value of
TRUE for all conditions. Unlike 'local' filters that act on a tibble,
the galah implementations work by amending a query which is then enacted
by collect() or one of the atlas_ family of functions (such as
atlas_counts() or atlas_occurrences()).
## S3 method for class 'data_request'
filter(.data, ...)
## S3 method for class 'metadata_request'
filter(.data, ...)
## S3 method for class 'files_request'
filter(.data, ...)
galah_filter(..., profile = NULL)
.data |
An object of class |
... |
Expressions that return a logical value, and are defined in terms
of the variables in the selected atlas (and checked using |
profile |
Syntax
filter.data_request() and galah_filter() uses non-standard evaluation
(NSE), and are designed to be as compatible as possible with
dplyr::filter() syntax. Permissible examples include:
== (e.g. year = 2020) but not = (for consistency with dplyr)
!=, e.g. year != 2020)
> or >= (e.g. year >= 2020)
< or <= (e.g. year <= 2020)
OR statements (e.g. year == 2018 | year == 2020)
AND statements (e.g. year >= 2000 & year <= 2020)
Some general tips:
Separating statements with a comma is equivalent to an AND statement;
Ergo filter(year >= 2010 & year < 2020) is the same as
_filter(year >= 2010, year < 2020).
All statements must include the field name; so
filter(year == 2010 | year == 2021) works, as does
filter(year == c(2010, 2021)), but filter(year == 2010 | 2021)
fails.
It is possible to use an object to specify required values, e.g.
year_value <- 2010; filter(year > year_value).
solr supports range queries on text as well as numbers; so
filter(cl22 >= "Tasmania") is valid.
It is possible to filter by 'assertions', which are statements about data
validity, such as filter(assertions != c("INVALID_SCIENTIFIC_NAME", "COORDINATE_INVALID").
Valid assertions can be found using show_all(assertions).
Exceptions
When querying occurrences, species, or their respective counts (i.e. all of
the above examples), field names are checked internally against
show_all(fields). There are some cases where bespoke field names are
required, as follows.
When requesting a data download from a DOI, the field doi is valid, i.e.:
galah_call() |> filter(doi = "a-long-doi-string") |> collect()
For taxonomic metadata, the taxa field is valid:
request_metadata() |> filter(taxa == "Chordata") |> unnest()
For building taxonomic trees, the rank field is valid:
request_data() |>
identify("Chordata") |>
filter(rank == "class") |>
atlas_taxonomy()
Media queries are more involved, but break two rules: they accept the media
field, and they accept a tibble on the rhs of the equation. For example,
users wishing to break down media queries into their respective API calls
should begin with an occurrence query:
occurrences <- galah_call() |>
identify("Litoria peronii) |>
select(group = c("basic", "media") |>
collect()
They can then use the media field to request media metadata:
media_metadata <- galah_call("metadata") |>
filter(media == occurrences) |>
collect()
And finally, the metadata tibble can be used to request files:
galah_call("files") |>
filter(media == media_metadata) |>
collect()
A tibble containing filter values.
select(),
group_by() and geolocate() for
other ways to amend the information returned by atlas_() functions. Use
search_all(fields) to find fields that you can filter by, and
show_values() to find what values of those filters are available.
## Not run:
galah_call() |>
filter(year >= 2019,
basisOfRecord == "HumanObservation") |>
count() |>
collect()
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.