contains_exactly | R Documentation |
Given a set of keys or key combinations, check whether all thos combinations occur, or check that they do not occur. Supports globbing and regular expressions.
contains_exactly(keys, by = NULL, allow_duplicates = FALSE)
contains_at_least(keys, by = NULL)
contains_at_most(keys, by = NULL)
does_not_contain(keys)
keys |
A data frame or bare (unquoted) name of a data
frame passed as a reference to |
by |
A bare (unquoted) variable or list of variable names that occur in the data under scrutiny. The data will be split into groups according to these variables and the check is performed on each group. |
allow_duplicates |
|
contains_exactly | dataset contains exactly the key set, no more, no less. |
contains_at_least | dataset contains at least the given keys. |
contains_at_most | all keys in the data set are contained the given keys. |
does_not_contain | The keys are interpreted as forbidden key combinations. |
For contains_exactly
, contains_at_least
, and
contains_at_most
a logical
vector with one entry for each
record in the dataset. Any group not conforming to the test keys will have
FALSE
assigned to each record in the group (see examples).
For contains_at_least
: a logical
vector equal to the number of
records under scrutiny. It is FALSE
where key combinations do not match
any value in keys
.
For does_not_contain
: a logical
vector with size equal to the
number of records under scrutiny. It is FALSE
where key combinations
do not match any value in keys
.
Globbing is a simple method of defining string patterns where the asterisks
(*
) is used a wildcard. For example, the globbing pattern
"abc*"
stands for any string starting with "abc"
.
Other cross-record-helpers:
do_by()
,
exists_any()
,
hb()
,
hierarchy()
,
is_complete()
,
is_linear_sequence()
,
is_unique()
## Check that data is present for all quarters in 2018-2019
dat <- data.frame(
year = rep(c("2018","2019"),each=4)
, quarter = rep(sprintf("Q%d",1:4), 2)
, value = sample(20:50,8)
)
# Method 1: creating a data frame in-place (only for simple cases)
rule <- validator(contains_exactly(
expand.grid(year=c("2018","2019"), quarter=c("Q1","Q2","Q3","Q4"))
)
)
out <- confront(dat, rule)
values(out)
# Method 2: pass the keyset to 'confront', and reference it in the rule.
# this scales to larger key sets but it needs a 'contract' between the
# rule definition and how 'confront' is called.
keyset <- expand.grid(year=c("2018","2019"), quarter=c("Q1","Q2","Q3","Q4"))
rule <- validator(contains_exactly(all_keys))
out <- confront(dat, rule, ref=list(all_keys = keyset))
values(out)
## Globbing (use * as a wildcard)
# transaction data
transactions <- data.frame(
sender = c("S21", "X34", "S45","Z22")
, receiver = c("FG0", "FG2", "DF1","KK2")
, value = sample(70:100,4)
)
# forbidden combinations: if the sender starts with "S",
# the receiver can not start "FG"
forbidden <- data.frame(sender="S*",receiver = "FG*")
rule <- validator(does_not_contain(glob(forbidden_keys)))
out <- confront(transactions, rule, ref=list(forbidden_keys=forbidden))
values(out)
## Quick interactive testing
# use 'with':
with(transactions, does_not_contain(forbidden))
## Grouping
# data in 'long' format
dat <- expand.grid(
year = c("2018","2019")
, quarter = c("Q1","Q2","Q3","Q4")
, variable = c("import","export")
)
dat$value <- sample(50:100,nrow(dat))
periods <- expand.grid(
year = c("2018","2019")
, quarter = c("Q1","Q2","Q3","Q4")
)
rule <- validator(contains_exactly(all_periods, by=variable))
out <- confront(dat, rule, ref=list(all_periods=periods))
values(out)
# remove one export record
dat1 <- dat[-15,]
out1 <- confront(dat1, rule, ref=list(all_periods=periods))
values(out1)
values(out1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.