distinct.data_request: Keep distinct/unique rows
In galah: Biodiversity Data from the GBIF Node Network

distinct.data_request

R Documentation

Keep distinct/unique rows

Description

Keep only unique/distinct rows from a data frame. This is similar to unique.data.frame() but considerably faster. It is evaluated lazily.

Usage

## S3 method for class 'data_request'
distinct(.data, ..., .keep_all = FALSE)

Arguments

`.data`	A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.
`...`	Variables to use when determining uniqueness. Unlike the `dplyr` implementation this must be set for the function to do anything, and only a single variable is used.
`.keep_all`	If `TRUE`, keep all variables in .data. Defaults to `FALSE`

Details

This function has several potential uses. In it's default mode, it simply shows the unique values for a supplied field:

galah_call() |>
  distinct(basisOfRecord) |> 
  collect()

# A tibble: 9 × 1
  basisOfRecord      
  <chr>              
1 HUMAN_OBSERVATION  
2 PRESERVED_SPECIMEN 
3 OCCURRENCE         
4 MACHINE_OBSERVATION
5 OBSERVATION        
6 MATERIAL_SAMPLE    
7 LIVING_SPECIMEN    
8 FOSSIL_SPECIMEN    
9 MATERIAL_CITATION

This is the same result as you would get using show_values():

search_all(fields, "basisOfRecord") |> 
  show_values()

Using distinct() is somewhat more reliable, however, as it doesn't rely on searching the tibble returned by show_all(fields). It is also more efficient, particularly when caching is turned off. If the goal is to retrieve the number of levels of a factor, use:

galah_call() |>
  distinct(basisOfRecord) |> 
  count() |>
  collect()

# A tibble: 1 × 1
  count
  <int>
1     9

When the variable passed to distinct() in the above example is speciesID, this is identical to calling:

atlas_counts(type = "species")

You can also pass group_by() to find the number of facets per level of a second variable:

galah_call() |>
  identify("Perameles") |>
  distinct(speciesID) |> 
  group_by(basisOfRecord) |>
  count() |>
  collect()

# A tibble: 8 × 2
  basisOfRecord       count
  <chr>               <int>
1 Human observation       7
2 Preserved specimen      9
3 Machine observation     2
4 Observation             3
5 Occurrence              3
6 Material Sample         4
7 Fossil specimen         1
8 Living specimen         1

By setting .keep_all = TRUE, we get more information on each record. Due to limits on the APIs this is not a perfect analogy for running dplyr::distinct() on raw occurrences; but it does allow us to generalise atlas_species() to use any taxonomic identifier. For example, we might choose to show data by family instead of species:

galah_call() |>
  identify("Coleoptera") |>
  distinct(familyID, .keep_all = TRUE) |> 
  collect()

Using group_by() is also valid:

galah_call() |>
    filter(year == 2024,
           genus == "Crinia") |>
    group_by(speciesID) |>
    distinct(.keep_all = TRUE) |>
    collapse()

In this case, collect() and atlas_species() are synonymous, with the exception that the latter does not require you to set the .keep_all argument to TRUE. So you could instead use:

galah_call() |>
  identify("Coleoptera") |>
  distinct(familyID) |> 
  atlas_species()

Examples

## Not run: 
galah_call() |>
  distinct(basisOfRecord) |>
  count() |>
  collect()

## End(Not run)

galah documentation built on Feb. 11, 2026, 9:11 a.m.

galah index

README.md Accessing sensitive data Download data reproducibly Look up information Object-Oriented Programming Quick start guide Spatial filtering Taxonomic filtering Temporal filtering

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

galah
Biodiversity Data from the GBIF Node Network

distinct.data_request: Keep distinct/unique rows
In galah: Biodiversity Data from the GBIF Node Network

Keep distinct/unique rows

Description

Usage

Arguments

Details

Examples

Related to distinct.data_request in galah...

R Package Documentation

Browse R Packages

We want your feedback!

galah Biodiversity Data from the GBIF Node Network

distinct.data_request: Keep distinct/unique rows In galah: Biodiversity Data from the GBIF Node Network

Keep distinct/unique rows

Description

Usage

Arguments

Details

Examples

Related to distinct.data_request in galah...

R Package Documentation

Browse R Packages

We want your feedback!

galah
Biodiversity Data from the GBIF Node Network

distinct.data_request: Keep distinct/unique rows
In galah: Biodiversity Data from the GBIF Node Network