filter.data_request | R Documentation |
The filter()
function is used to subset a data, retaining all rows that
satisfy your conditions. To be retained, the row must produce a value of
TRUE
for all conditions. Unlike 'local' filters that act on a tibble
,
the galah implementations work by amending a query which is then enacted
by collect()
or one of the atlas_
family of functions (such as
atlas_counts()
or atlas_occurrences()
).
## S3 method for class 'data_request'
filter(.data, ...)
## S3 method for class 'metadata_request'
filter(.data, ...)
## S3 method for class 'files_request'
filter(.data, ...)
galah_filter(..., profile = NULL)
.data |
An object of class |
... |
Expressions that return a logical value, and are defined in terms
of the variables in the selected atlas (and checked using |
profile |
Syntax
filter.data_request()
and galah_filter()
uses non-standard evaluation
(NSE), and are designed to be as compatible as possible with
dplyr::filter()
syntax. Permissible examples include:
==
(e.g. year = 2020
) but not =
(for consistency with dplyr
)
!=
, e.g. year != 2020
)
>
or >=
(e.g. year >= 2020
)
<
or <=
(e.g. year <= 2020
)
OR
statements (e.g. year == 2018 | year == 2020
)
AND
statements (e.g. year >= 2000 & year <= 2020
)
Some general tips:
Separating statements with a comma is equivalent to an AND
statement;
Ergo filter(year >= 2010 & year < 2020)
is the same as
_filter(year >= 2010, year < 2020)
.
All statements must include the field name; so
filter(year == 2010 | year == 2021)
works, as does
filter(year == c(2010, 2021))
, but filter(year == 2010 | 2021)
fails.
It is possible to use an object to specify required values, e.g.
year_value <- 2010; filter(year > year_value)
.
solr
supports range queries on text as well as numbers; so
filter(cl22 >= "Tasmania")
is valid.
It is possible to filter by 'assertions', which are statements about data
validity, such as filter(assertions != c("INVALID_SCIENTIFIC_NAME", "COORDINATE_INVALID")
.
Valid assertions can be found using show_all(assertions)
.
Exceptions
When querying occurrences, species, or their respective counts (i.e. all of
the above examples), field names are checked internally against
show_all(fields)
. There are some cases where bespoke field names are
required, as follows.
When requesting a data download from a DOI, the field doi
is valid, i.e.:
galah_call() |> filter(doi = "a-long-doi-string") |> collect()
For taxonomic metadata, the taxa
field is valid:
request_metadata() |> filter(taxa == "Chordata") |> unnest()
For building taxonomic trees, the rank
field is valid:
request_data() |> identify("Chordata") |> filter(rank == "class") |> atlas_taxonomy()
Media queries are more involved, but break two rules: they accept the media
field, and they accept a tibble on the rhs of the equation. For example,
users wishing to break down media queries into their respective API calls
should begin with an occurrence query:
occurrences <- galah_call() |> identify("Litoria peronii) |> select(group = c("basic", "media") |> collect()
They can then use the media
field to request media metadata:
media_metadata <- galah_call("metadata") |> filter(media == occurrences) |> collect()
And finally, the metadata tibble can be used to request files:
galah_call("files") |> filter(media == media_metadata) |> collect()
A tibble containing filter values.
select()
,
group_by()
and geolocate()
for
other ways to amend the information returned by atlas_()
functions. Use
search_all(fields)
to find fields that you can filter by, and
show_values()
to find what values of those filters are available.
## Not run:
galah_call() |>
filter(year >= 2019,
basisOfRecord == "HumanObservation") |>
count() |>
collect()
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.