contains_exactly: Check records using a predifined table of (im)possible values

View source: R/genericrules.R

contains_exactlyR Documentation

Check records using a predifined table of (im)possible values


Given a set of keys or key combinations, check whether all thos combinations occur, or check that they do not occur. Supports globbing and regular expressions.


contains_exactly(keys, by = NULL, allow_duplicates = FALSE)

contains_at_least(keys, by = NULL)

contains_at_most(keys, by = NULL)




A data frame or bare (unquoted) name of a data frame passed as a reference to confront (see examples). The column names of keys must also occurr in the columns of the data under scrutiny.


A bare (unquoted) variable or list of variable names that occur in the data under scrutiny. The data will be split into groups according to these variables and the check is performed on each group.


[logical] toggle whether key combinations can occur more than once.


contains_exactly dataset contains exactly the key set, no more, no less.
contains_at_least dataset contains at least the given keys.
contains_at_most all keys in the data set are contained the given keys.
does_not_contain The keys are interpreted as forbidden key combinations.


For contains_exactly, contains_at_least, and contains_at_most a logical vector with one entry for each record in the dataset. Any group not conforming to the test keys will have FALSE assigned to each record in the group (see examples).

For contains_at_least: a logical vector equal to the number of records under scrutiny. It is FALSE where key combinations do not match any value in keys.

For does_not_contain: a logical vector with size equal to the number of records under scrutiny. It is FALSE where key combinations do not match any value in keys.


Globbing is a simple method of defining string patterns where the asterisks (*) is used a wildcard. For example, the globbing pattern "abc*" stands for any string starting with "abc".

See Also

Other cross-record-helpers: do_by(), exists_any(), hb(), hierarchy(), is_complete(), is_linear_sequence(), is_unique()


## Check that data is present for all quarters in 2018-2019
dat <- data.frame(
    year    = rep(c("2018","2019"),each=4)
  , quarter = rep(sprintf("Q%d",1:4), 2)
  , value   = sample(20:50,8)

# Method 1: creating a data frame in-place (only for simple cases)
rule <- validator(contains_exactly(
           expand.grid(year=c("2018","2019"), quarter=c("Q1","Q2","Q3","Q4"))
out <- confront(dat, rule)

# Method 2: pass the keyset to 'confront', and reference it in the rule.
# this scales to larger key sets but it needs a 'contract' between the
# rule definition and how 'confront' is called.

keyset <- expand.grid(year=c("2018","2019"), quarter=c("Q1","Q2","Q3","Q4"))
rule <- validator(contains_exactly(all_keys))
out <- confront(dat, rule, ref=list(all_keys = keyset))

## Globbing (use * as a wildcard)

# transaction data 
transactions <- data.frame(
    sender   = c("S21", "X34", "S45","Z22")
  , receiver = c("FG0", "FG2", "DF1","KK2")
  , value    = sample(70:100,4)

# forbidden combinations: if the sender starts with "S", 
# the receiver can not start "FG"
forbidden <- data.frame(sender="S*",receiver = "FG*")

rule <- validator(does_not_contain(glob(forbidden_keys)))
out <- confront(transactions, rule, ref=list(forbidden_keys=forbidden))

## Quick interactive testing
# use 'with':
with(transactions, does_not_contain(forbidden)) 

## Grouping 

# data in 'long' format
dat <- expand.grid(
  year = c("2018","2019")
  , quarter = c("Q1","Q2","Q3","Q4")
  , variable = c("import","export")
dat$value <- sample(50:100,nrow(dat))

periods <- expand.grid(
  year = c("2018","2019")
  , quarter = c("Q1","Q2","Q3","Q4")

rule <- validator(contains_exactly(all_periods, by=variable))

out <- confront(dat, rule, ref=list(all_periods=periods))

# remove one  export record

dat1 <- dat[-15,]
out1 <- confront(dat1, rule, ref=list(all_periods=periods))

validate documentation built on March 31, 2023, 6:27 p.m.