identifier_functions: Create unique identifier columns

composite_idR Documentation

Create unique identifier columns

Description

A unique identifier is a pattern of words, letters and/or numbers that is unique to a single record within a dataset. Unique identifiers are useful because they identify individual observations, and make it possible to change, amend or delete observations over time. They also prevent accidental deletion when when more than one record contains the same information(and would otherwise be considered a duplicate).

The identifier functions in corella make it easier to generate columns with unique identifiers in a dataset. These functions can be used within set_events(), set_occurrences(), or (equivalently) dplyr::mutate().

Usage

composite_id(..., sep = "-")

sequential_id(width)

random_id()

Arguments

...

Zero or more variable names from the tibble being mutated (unquoted), and/or zero or more ⁠_id⁠ functions, separated by commas.

sep

Character used to separate field values. Defaults to "-"

width

(Integer) how many characters should the resulting string be? Defaults to one plus the order of magnitude of the largest number.

Details

Generally speaking, it is better to use existing information from a dataset to generate identifiers. For this reason we recommend using composite_id() to aggregate existing fields, if no such composite is already present within the dataset. Composite IDs are more meaningful and stable; they are easier to check and harder to overwrite.

It is possible to call sequential_id() or random_id() within composite_id() to combine existing and new columns.

Value

An amended tibble containing a column with identifiers in the requested format.

Examples

df <- tibble::tibble(
  eventDate = paste0(rep(c(2020:2024), 3), "-01-01"),
  basisOfRecord = "humanObservation",
  site = rep(c("A01", "A02", "A03"), each = 5)
  )

# Add composite ID using a random ID, site name and eventDate
df |>
  set_occurrences(
    occurrenceID = composite_id(random_id(),
                                site,
                                eventDate)
    )

# Add composite ID using a sequential number, site name and eventDate
df |>
  set_occurrences(
    occurrenceID = composite_id(sequential_id(),
                                site,
                                eventDate)
    )

corella documentation built on April 4, 2025, 12:20 a.m.