check_data: Checks if a dataset confirms to a given set of rules

View source: R/check_data.R

check_dataR Documentation

Checks if a dataset confirms to a given set of rules

Description

Checks if a dataset confirms to a given set of rules

Usage

check_data(
  x,
  rules,
  xname = deparse(substitute(x)),
  stop_on_fail = FALSE,
  stop_on_warn = FALSE,
  stop_on_error = FALSE,
  stop_on_schema_fail = FALSE,
  extra_columns = c("ignore", "warn", "fail")
)

Arguments

x

a dataset, either a data.frame, dplyr::tibble, data.table::data.table, arrow::arrow_table, arrow::open_dataset, or dplyr::tbl (SQL connection). Can also be a named list of datasets when using reference rules.

rules

a list of rules

xname

optional, a name for the x variable (only used for errors)

stop_on_fail

when any of the rules fail, throw an error with stop

stop_on_warn

when a warning is found in the code execution, throw an error with stop

stop_on_error

when an error is found in the code execution, throw an error with stop

stop_on_schema_fail

when any schema checks fail, throw an error with stop

extra_columns

how to treat columns in x that are not declared in optional data_columns attached to a ruleset. One of "ignore" (default), "warn", or "fail".

Value

a data.frame-like object with one row for each rule and its results

See Also

detect_backend()

Examples

rs <- ruleset(
  rule(mpg > 10),
  rule(cyl %in% c(4, 6)), # missing 8
  rule(qsec >= 14.5 & qsec <= 22.9)
)
rs

check_data(mtcars, rs)

# schema + relation checks in one output
orders <- data.frame(order_id = 1:3, customer_id = c(10, 99, NA), amount = c(10, -5, 20))
customers <- data.frame(customer_id = c(10, 11))

rs2 <- ruleset(
  rule(amount >= 0, name = "amount non-negative"),
  reference_rule(
    local_col = "customer_id",
    ref_dataset = "customers",
    ref_col = "customer_id",
    allow_na = TRUE
  ),
  data_columns = list(
    data_column("order_id", type = "int", optional = FALSE),
    data_column("customer_id", type = "double", optional = FALSE),
    data_column("amount", type = "double", optional = FALSE)
  ),
  data_name = "orders"
)

check_data(list(orders = orders, customers = customers), rs2)

dataverifyr documentation built on April 11, 2026, 1:06 a.m.