knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
library(validatetools)

R-CMD-check CRAN status codecov Mentioned in Awesome Official Statistics

validatetools

validatetools is a utility package for managing validation rule sets that are defined with validate. In production systems validation rule sets tend to grow organically and accumulate redundant or (partially) contradictory rules. validatetools helps to identify problems with large rule sets and includes simplification methods for resolving issues.

Installation

validatetools is available from CRAN and can be installed with

install.packages("validatetools")

The latest beta version of validatetools can be installed with

install.packages("validatetools", repos = "https://data-cleaning.github.io/drat")

The adventurous can install an (unstable) development version of validatetools from github with:

# install.packages("devtools")
devtools::install_github("data-cleaning/validatetools")

Example

Check for feasibility

rules <- validator( x > 0)
is_infeasible(rules)

rules <- validator( rule1 = x > 0
                  , rule2 = x < 0
                  )
is_infeasible(rules)

detect_infeasible_rules(rules)
make_feasible(rules)

# find out the conflict with this rule
is_contradicted_by(rules, "rule1")

Simplifying

The function simplify_rules combines most simplification methods of validatetools to simplify a rule set. For example, it reduces the following rule set to a simpler form:

rules <- validator( if (age < 16) income == 0
                  , job %in% c("yes", "no")
                  , if (job == "yes") income > 0
                  )
simplify_rules(rules, age = 13)
#or 
simplify_rules(rules, job = "yes")

simplify_rules combines the following simplification and substitution methods:

Value substitution

rules <- validator( rule1 = height > 5
                  , rule2 = max_height >= height
                  , rule3 = if (gender == "male") weight > 100
                  , rule4 = gender %in% c("male", "female")
                  )
substitute_values(rules, height = 6, gender = "male")

Finding fixed values

rules <- validator( x >= 0, x <=0)
detect_fixed_variables(rules)
simplify_fixed_variables(rules)

rules <- validator( rule1 = x1 + x2 + x3 == 0
                  , rule2 = x1 + x2 >= 0
                  , rule3 = x3 >=0
                  )
simplify_fixed_variables(rules)

Simplifying conditional statements

# non-relaxing clause
rules <- validator( r1 = if (income > 0) age >= 16
                  , r2 = age < 12
                  )
# age > 16 is always FALSE so r1 can be simplified
simplify_conditional(rules)


# non-constraining clause
rules <- validator( if (age  < 16) income == 0
                  , if (age >=16) income >= 0
                  )
simplify_conditional(rules)

Removing redundant rules

rules <- validator( rule1 = age > 12
                  , rule2 = age > 18
                  )

# rule1 is superfluous
remove_redundancy(rules)

rules <- validator( rule1 = age > 12
                  , rule2 = age > 12
)

# standout: rule1 and rule2, first rule wins
remove_redundancy(rules)

# Note that detection signifies both rules!
detect_redundancy(rules)


data-cleaning/validate.simplify documentation built on Oct. 11, 2023, 12:15 a.m.