filter.data_request: Keep rows that match a condition
In galah: Biodiversity Data from the GBIF Node Network

filter.data_request

R Documentation

Keep rows that match a condition

Description

The filter() function is used to subset a data, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Unlike 'local' filters that act on a tibble, the galah implementations work by amending a query which is then enacted by collect() or one of the atlas_ family of functions (such as atlas_counts() or atlas_occurrences()).

Usage

## S3 method for class 'data_request'
filter(.data, ...)

## S3 method for class 'metadata_request'
filter(.data, ...)

## S3 method for class 'files_request'
filter(.data, ...)

galah_filter(..., profile = NULL)

Arguments

`.data`	An object of class `data_request`, `metadata_request` or `files_request`, created using `galah_call()` or related functions.
`...`	Expressions that return a logical value, and are defined in terms of the variables in the selected atlas (and checked using `show_all(fields)`. If multiple expressions are included, they are combined with the & operator. Only rows for which all conditions evaluate to `TRUE` are kept.
`profile`	Use `galah_apply_profile` instead.

Details

Syntax

filter.data_request() and galah_filter() uses non-standard evaluation (NSE), and are designed to be as compatible as possible with dplyr::filter() syntax. Permissible examples include:

== (e.g. year = 2020) but not = (for consistency with dplyr)
!=, e.g. year != 2020)
> or >= (e.g. year >= 2020)
< or <= (e.g. year <= 2020)
OR statements (e.g. year == 2018 | year == 2020)
AND statements (e.g. year >= 2000 & year <= 2020)

Some general tips:

Separating statements with a comma is equivalent to an AND statement; Ergo filter(year >= 2010 & year < 2020) is the same as ⁠_filter(year >= 2010, year < 2020)⁠.
All statements must include the field name; so filter(year == 2010 | year == 2021) works, as does filter(year == c(2010, 2021)), but filter(year == 2010 | 2021) fails.
It is possible to use an object to specify required values, e.g. ⁠year_value <- 2010; filter(year > year_value)⁠.
solr supports range queries on text as well as numbers; so filter(cl22 >= "Tasmania") is valid.
It is possible to filter by 'assertions', which are statements about data validity, such as ⁠filter(assertions != c("INVALID_SCIENTIFIC_NAME", "COORDINATE_INVALID")⁠. Valid assertions can be found using show_all(assertions).

Exceptions

When querying occurrences, species, or their respective counts (i.e. all of the above examples), field names are checked internally against show_all(fields). There are some cases where bespoke field names are required, as follows.

When requesting a data download from a DOI, the field doi is valid, i.e.:

galah_call() |> 
  filter(doi = "a-long-doi-string") |> 
  collect()

For taxonomic metadata, the taxa field is valid:

request_metadata() |> 
  filter(taxa == "Chordata") |> 
  unnest()

For building taxonomic trees, the rank field is valid:

request_data() |>
  identify("Chordata") |>
  filter(rank == "class") |>
  atlas_taxonomy()

Media queries are more involved, but break two rules: they accept the media field, and they accept a tibble on the rhs of the equation. For example, users wishing to break down media queries into their respective API calls should begin with an occurrence query:

occurrences <- galah_call() |> 
   identify("Litoria peronii) |> 
   select(group = c("basic", "media") |> 
   collect()

They can then use the media field to request media metadata:

media_metadata <- galah_call("metadata") |>
  filter(media == occurrences) |>
  collect()

And finally, the metadata tibble can be used to request files:

galah_call("files") |>
  filter(media == media_metadata) |>
  collect()

Value

A tibble containing filter values.

Examples

## Not run: 
galah_call() |>
  filter(year >= 2019,
         basisOfRecord == "HumanObservation") |>
  count() |>
  collect()

## End(Not run)

galah documentation built on June 12, 2025, 5:09 p.m.

galah index

README.md Choosing an atlas Download data Download data reproducibly Look up information Narrow your results Object-Oriented Programming Quick start guide Spatial filtering Taxonomic filtering Temporal filtering

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

galah
Biodiversity Data from the GBIF Node Network

filter.data_request: Keep rows that match a condition
In galah: Biodiversity Data from the GBIF Node Network

Keep rows that match a condition

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to filter.data_request in galah...

R Package Documentation

Browse R Packages

We want your feedback!

galah Biodiversity Data from the GBIF Node Network

filter.data_request: Keep rows that match a condition In galah: Biodiversity Data from the GBIF Node Network

Keep rows that match a condition

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to filter.data_request in galah...

R Package Documentation

Browse R Packages

We want your feedback!

galah
Biodiversity Data from the GBIF Node Network

filter.data_request: Keep rows that match a condition
In galah: Biodiversity Data from the GBIF Node Network