validate: Data Validation Infrastructure

validateR Documentation

Data Validation Infrastructure

Description

Data often suffer from errors and missing values. A necessary step before data analysis is verifying and validating your data. Package validate is a toolbox for creating validation rules and checking data against these rules.

Getting started

The easiest way to get started is through the examples given in check_that.

The general workflow in validate follows the following pattern.

  • Define a set of rules or quality indicator using validator or indicator.

  • confront data with the rules or indicators,

  • Examine the results either graphically or by summary.

There are several convenience functions that allow one to define rules from the commandline, through a (freeform or yaml) file and to investigate and maintain the rules themselves. Please have a look at the cookbook for a comprehensive introduction.

References

An overview of this package, its underlying ideas and many examples can be found in MPJ van der Loo and E. de Jonge (2018) Statistical data cleaning with applications in R John Wiley & Sons.

Please use citation("validate") to get a citation for (scientific) publications.


validate documentation built on March 31, 2023, 6:27 p.m.