reference_rule: Define a Relational Reference Rule

View source: R/data_column.R

reference_ruleR Documentation

Define a Relational Reference Rule

Description

Creates a rule that checks whether values in a local column exist in a column of a referenced dataset. Use with check_data() by supplying x as a named list of datasets and setting data_name in ruleset() (or by ordering the list so the first entry is the primary dataset).

Usage

reference_rule(
  local_col,
  ref_dataset,
  ref_col,
  name = NA,
  allow_na = FALSE,
  negate = FALSE,
  ...
)

Arguments

local_col

column name in the primary dataset.

ref_dataset

name of the referenced dataset in the x list.

ref_col

column name in the referenced dataset.

name

optional display name for the rule.

allow_na

logical; if TRUE, missing values in local_col are treated as passing.

negate

logical; if TRUE, inverts the rule (values must not be in the referenced column).

...

additional fields attached to the rule object.

Value

A reference_rule object that can be included in ruleset().

Examples

flights <- data.frame(carrier = c("AA", "BB", NA_character_))
carriers <- data.frame(carrier_id = c("AA"))

rs <- ruleset(
  reference_rule(
    local_col = "carrier",
    ref_dataset = "carriers",
    ref_col = "carrier_id",
    allow_na = TRUE
  ),
  data_name = "flights"
)

check_data(list(flights = flights, carriers = carriers), rs)

# negated relation: value must NOT exist in blacklist
blacklist <- data.frame(carrier_id = c("XX", "YY"))
rs_neg <- ruleset(
  reference_rule(
    local_col = "carrier",
    ref_dataset = "blacklist",
    ref_col = "carrier_id",
    negate = TRUE,
    allow_na = TRUE
  ),
  data_name = "flights"
)

check_data(list(flights = flights, blacklist = blacklist), rs_neg)

dataverifyr documentation built on April 11, 2026, 1:06 a.m.