knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

validatedb

CRAN status R build status Codecov test
coverage Mentioned in Awesome Official Statistics

validatedb executes validation checks written with R package validate on a database. This allows for checking the validity of records in a database.

Installation

You can install a development version with

remotes::install_github("data-cleaning/validatedb")

Example

library(validatedb)

First we setup a table in a database (for demo purpose)

# create a table in a database
income <- data.frame(id=1:2, age=c(12,35), salary = c(1000,NA))
con <- DBI::dbConnect(RSQLite::SQLite())
DBI::dbWriteTable(con, "income", income)

We retrieve a reference/handle to the table in the DB with dplyr

tbl_income <- tbl(con, "income")
print(tbl_income)

Let's define a rule set and confront the table with it:

rules <- validator( is_adult   = age >= 18
                  , has_income = salary > 0
                  , mean_age   = mean(age,na.rm=TRUE) > 24
                  , has_values = is_complete(age, salary)
                  )

# and confront!
cf <- confront(tbl_income, rules, key = "id")

print(cf)

summary(cf)

Values (i.e. validations on the table) can be retrieved like in validate with type="matrix" or type="list"

values(cf, type = "matrix")

But often this seems more handy:

values(cf, type = "tbl")

or

values(cf, type = "tbl", sparse=TRUE)

We can see the sql code by using show_query:

show_query(cf)

Or write the sql to a file for documentation (and inspiration)

dump_sql(cf, "validation.sql")
```r
dump_sql(cf)
### Aggregate example

```r

validate specific functions

Added:

Todo

Some newly added validate utility functions are (still) missing from validatedb.



data-cleaning/validatedb documentation built on June 11, 2022, 4:33 p.m.