knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
validatedb
executes validation checks written with R package validate
on a
database. This allows for checking the validity of records in a database.
You can install a development version with
remotes::install_github("data-cleaning/validatedb")
library(validatedb)
First we setup a table in a database (for demo purpose)
# create a table in a database income <- data.frame(id=1:2, age=c(12,35), salary = c(1000,NA)) con <- DBI::dbConnect(RSQLite::SQLite()) DBI::dbWriteTable(con, "income", income)
We retrieve a reference/handle to the table in the DB with dplyr
tbl_income <- tbl(con, "income") print(tbl_income)
Let's define a rule set and confront the table with it:
rules <- validator( is_adult = age >= 18 , has_income = salary > 0 , mean_age = mean(age,na.rm=TRUE) > 24 , has_values = is_complete(age, salary) ) # and confront! cf <- confront(tbl_income, rules, key = "id") print(cf) summary(cf)
Values (i.e. validations on the table) can be retrieved like in validate
with
type="matrix"
or type="list"
values(cf, type = "matrix")
But often this seems more handy:
values(cf, type = "tbl")
or
values(cf, type = "tbl", sparse=TRUE)
We can see the sql code by using show_query
:
show_query(cf)
Or write the sql to a file for documentation (and inspiration)
dump_sql(cf, "validation.sql")
```r
dump_sql(cf)
### Aggregate example ```r
is_complete
, all_complete
is_unique
, all_unique
exists_any
, exists_one
do_by
, sum_by
, mean_by
, min_by
, max_by
Some newly added validate
utility functions are (still) missing from validatedb
.
contains_exactly
is_linear_sequence
hierachy
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.