syntax: Syntax to define validation or indicator rules

syntaxR Documentation

Syntax to define validation or indicator rules

Description

A concise overview of the validate syntax.

Basic syntax

The basic rule is that an R-statement that evaluates to a logical is a validating statement. This is established by static code inspection when validator reads a (set of) user-defined validation rule(s).

Comparisons

All basic comparisons, including >, >=, ==, !=, <=, <, %in% are validating statements. When executing a validating statement, the %in% operator is replaced with %vin%.

Logical operations

Unary logical operators '!', all() and any define validating statements. Binary logical operations including &, &&, |, ||, are validating when P and Q in e.g. P & Q are validating. (note that the short-circuits && and & onnly return the first logical value, in cases where for P && Q, P and/or Q are vectors. Binary logical implication P\Rightarrow Q (P implies Q) is implemented as if ( P ) Q. The latter is interpreted as !(P) | Q.

Type checking

Any function starting with is. (e.g. is.numeric) is a validating expression.

Text search

grepl is a validating expression.

Functional dependencies

Armstrong's functional dependencies, of the form A + B \to C + D are represented using the ~, e.g. A + B ~ C + D. For example postcode ~ city means, that when two records have the same value for postcode, they must have the same value for city.

Reference the dataset as a whole

Metadata such as numer of rows, columns, column names and so on can be tested by referencing the whole data set with the '.'. For example, the rule nrow(.) == 15 checks whether there are 15 rows in the dataset at hand.

Uniqueness, completeness

These can be tested in principle with the 'dot' syntax. However, there are some convenience functions: is_complete, all_complete is_unique, all_unique.

Local, transient assignment

The operator ':=' can be used to set up local variables (during, for example, validation) to save time (the rhs of an assignment is computed only once) or to make your validation code more maintainable. Assignments work more or less like common R assignments: they are only valid for statements coming after the assignment and they may be overwritten. The result of computing the rhs is not part of a confrontation with data.

Groups

Often the same constraints/rules are valid for groups of variables. validate allows for compact notation. Variable groups can be used in-statement or by defining them with the := operator.

validator( var_group(a,b) > 0 )

is equivalent to

validator(G := var_group(a,b), G > 0)

is equivalent to

validator(a>0,b>0).

Using two groups results in the cartesian product of checks. So the statement

validator( f=var_group(c,d), g=var_group(a,b), g > f)

is equivalent to

validator(a > c, b > c, a > d, b > d)

File parsing

Please see the cookbook on how to read rules from and write rules to file:

vignette("cookbook",package="validate")


validate documentation built on July 4, 2024, 9:07 a.m.