confront_tbl_sparse: Create a sparse confrontation query

View source: R/confront_tbl_sparse.R

confront_tbl_sparseR Documentation

Create a sparse confrontation query

Description

Create a sparse confrontation query. Only errors and missing are stored. This stores all results of a tbl validation in a table with length(rules) columns and nrow(tbl) rows. Note that the result of this function is a (lazy) query object that still needs to be executed in the database, e.g. with dplyr::collect(), dplyr::collapse() or dplyr::compute().

Usage

confront_tbl_sparse(tbl, x, key, union_all = TRUE, check_rules = TRUE)

Arguments

tbl

dbplyr::tbl_dbi() table in a database, retrieved with tbl()

x

validate::validator() object with validation rules.

key

character with key column name, must be specified

union_all

if FALSE each rule is a separate query.

check_rules

if TRUE it is checked which rules 'work' on the db.

Details

The return value of the function is a list with:

  • $query: A dbplyr::tbl_dbi() object that refers to the confrontation query.

  • $errors: The validation rules that are not working on the database

  • $working: A logical with which expression are working on the database.

  • $exprs: All validation expressions.

Value

A object with the necessary information: see details

See Also

Other validation: tbl_validation-class, values,tbl_validation-method

Examples

# create a table in a database
income <- data.frame(id = letters[1:2], age=c(12,35), salary = c(1000,NA))
con <- dbplyr::src_memdb()
tbl_income <- dplyr::copy_to(con, income, overwrite=TRUE)
print(tbl_income)

# Let's define a rule set and confront the table with it:
rules <- validator( is_adult   = age >= 18
                  , has_income = salary > 0
                  , mean_age   = mean(age,na.rm=TRUE) > 20
                  )

# and confront! (we have to use a key, because a db...)
cf <- confront(tbl_income, rules, key = "id")
print(cf)
summary(cf)

# Values (i.e. validations on the table) can be retrieved like in `validate` 
# with`type="matrix"` (simplify = TRUE)
values(cf, type = "matrix")

# But often this seems more handy:
values(cf, type = "tbl")

# We can see the sql code by using `show_query`:
show_query(cf)

# identical
show_query(values(cf, type = "tbl"))

# sparse results in db (that the default)
values(cf, type="tbl", sparse=TRUE)

# or if you like data.frames
values(cf, type="data.frame", sparse=TRUE)

data-cleaning/validatedb documentation built on June 11, 2022, 4:33 p.m.