hierarchy: Check aggregates defined by a hierarchical code list

View source: R/genericrules.R

hierarchyR Documentation

Check aggregates defined by a hierarchical code list

Description

Check all aggregates defined by a code hierarchy.

Usage

hierarchy(
  values,
  labels,
  hierarchy,
  by = NULL,
  tol = 1e-08,
  na_value = TRUE,
  aggregator = sum,
  ...
)

Arguments

values

bare (unquoted) name of a variable that holds values that must aggregate according to the hierarchy.

labels

bare (unquoted) name of variable holding a grouping variable (a code from a hierarchical code list)

hierarchy

[data.frame] defining a hierarchical code list. The first column must contain (child) codes, and the second column contains their corresponding parents.

by

A bare (unquoted) variable or list of variable names that occur in the data under scrutiny. The data will be split into groups according to these variables and the check is performed on each group.

tol

[numeric] tolerance for equality checking

na_value

[logical] or NA. Value assigned to values that do not occurr in checks.

aggregator

[function] that aggregates children to their parents.

...

arguments passed to aggregator (e.g. na.rm=TRUE).

Value

A logical vector with the size of length(values). Every element involved in an aggregation error is labeled FALSE (aggregate plus aggregated elements). Elements that are involved in correct aggregations are set to TRUE, elements that are not involved in any check get the value na_value (by default: TRUE).

See Also

Other cross-record-helpers: contains_exactly(), do_by(), exists_any(), hb(), is_complete(), is_linear_sequence(), is_unique()

Examples

# We check some data against the built-in NACE revision 2 classification.
data(nace_rev2)
head(nace_rev2[1:4]) # columns 3 and 4 contain the child-parent relations.

d <- data.frame(
     nace   = c("01","01.1","01.11","01.12", "01.2")
   , volume = c(100 ,70    , 30    ,40     , 25    )
)
# It is possible to perform checks interactively
d$nacecheck <- hierarchy(d$volume, labels = d$nace, hierarchy=nace_rev2[3:4])
# we have that "01.1" == "01.11" + "01.12", but not "01" == "01.1" +  "01.2"
print(d)

# Usage as a valiation rule is as follows
rules <- validator(hierarchy(volume, labels = nace, hierarchy=validate::nace_rev_2[3:4]))
confront(d, rules)

# you can also pass a hierarchy as a reference, for example.

rules <- validator(hierarchy(volume, labels = nace, hierarchy=ref$nacecodes))
out <- confront(d, rules, ref=list(nacecodes=nace_rev2[3:4]))
summary(out)

# set a output to NA when a code does not occur in the code list.
d <- data.frame(
     nace   = c("01","01.1","01.11","01.12", "01.2", "foo")
   , volume = c(100 ,70    , 30    ,40     , 25     , 60)
)

d$nacecheck <- hierarchy(d$volume, labels = d$nace, hierarchy=nace_rev2[3:4]
                         , na_value = NA)
# we have that "01.1" == "01.11" + "01.12", but not "01" == "01.1" +  "01.2"
print(d)


validate documentation built on March 31, 2023, 6:27 p.m.