library(rmarkdown) library(sdcHierarchies)
The sdcHierarchies
packages allows to create, modify and export nested hierarchies that are used for example to define tables in statistical disclosure control software such as in sdcTable
Before using, the package needs to be loaded:
library(sdcHierarchies)
hier_create()
allows to create a hierarchy. Argument root
specifies the name of the root node. Optionally, it is possible to add some nodes to the top-level by listing their names in argument node_labs
. Also, hier_display()
shows the hierarchical structure of the current tree as shown below:
h <- hier_create(root = "Total", nodes = LETTERS[1:5]) hier_display(h)
Once such an object is created, it can be modified by the following functions:
hier_add()
: allows to add nodes to the hierarchyhier_delete()
: allows to delete nodes from the treehier_rename()
: allows to rename nodesThese functions can be applied as shown below:
## adding nodes below the node specified in argument `node` h <- hier_add(h, root = "A", nodes = c("a1", "a2")) h <- hier_add(h, root = "B", nodes = c("b1", "b2")) h <- hier_add(h, root = "b1", nodes = c("b1_a", "b1_b")) # deleting one or more nodes from the hierarchy h <- hier_delete(h, nodes = c("a1", "b2")) h <- hier_delete(h, nodes = c("a2")) # rename nodes h <- hier_rename(h, nodes = c("C" = "X", "D" = "Y")) hier_display(h)
We note that the underlying data.tree package allows to modify the objects on reference so no explicit assignment of the form is required.
Function hier_info()
returns information about the nodes that are specified in argument leaves
.
# about a specific node info <- hier_info(h, nodes = c("b1", "E"))
info
is a named list where each list element refers to a queried node. The results for level b1
could be extracted as shown below:
info$b1
Information about all nodes can be extracted by not specifying argument leaves
.
Function hier_convert()
takes a hierarchy and allows to convert the network based structure to different formats while hier_export()
does the conversion and writes the results to a file on the disk. The following formats are currently supported:
df
: a "@;label"-based format that can be used in sdcTabledt
: the same as df
, but the result is returned as a \code{data.table}argus
: also a "@;label"-based format that used to create hrc-files suitable for $\tau$-argusjson
: a json-encoded stringcode
: the required code to re-build the current hierarchysdc
: a list
which is a suitable input for sdcTable# conversion to a "@;label"-based format res_df <- hier_convert(h, as = "df") print(res_df)
The required code to create this hierarchy could be computed using:
code <- hier_convert(h, as = "code"); cat(code, sep = "\n")
Using hier_export()
one can write the results to a file. This is for example useful if one wants to create hrc
-files that could be used as input for $\tau$-argus which can be achieved as follows:
hier_export(h, as = "argus", path = file.path(tempfile(), "hierarchy.hrc"))
hier_import()
returns a network-based hierarchy given either a data.frame (in @;labs
-format), json format, code or from a tau-argus compatible hrc-file
. For example if we want to create a hierarchy based of res_df
which was previously created using hier_convert()
, the code is as simple as:
n_df <- hier_import(inp = res_df, from = "df") hier_display(n_df)
Using hier_import(inp = "hierarchy.hrc", from = "argus")
one could create a sdc hierarchy object directly from a hrc
-file.
Often it is the case, the the nested hierarchy information in encoded in a string. Function hier_compute()
allows to transform such strings into hierarchy objects. One can distinguish two cases: The first case is where all input codes have the same length while in the latter case the length of the codes differs. Let's assume we have a geographic code given in geo_m
where digits 1-2 refer to the first level, digit 3 to the second and digits 4-5 to the third level of the hierarchy.
geo_m <- c( "01051", "01053", "01054", "01055", "01056", "01057", "01058", "01059", "01060", "01061", "01062", "02000", "03151", "03152", "03153", "03154", "03155", "03156", "03157", "03158", "03251", "03252", "03254", "03255", "03256", "03257", "03351", "03352", "03353", "03354", "03355", "03356", "03357", "03358", "03359", "03360", "03361", "03451", "03452", "03453", "03454", "03455", "03456", "10155")
Function hier_compute()
takes a character vector and creates a hierarchy from it. In argument method
, two ways of specifying the encoded levels can be chosen.
endpos
: an integerish-vector must be specified in argument dim_spec
holding the end-position at each levellen
: an integerish-vector must be specified in argument dim_spec
containing for each level how many digits are requiredIn case the overal total is not encoded in the input, specifying argument root
allows to give a name to the overall total. Additionally, it is possible to set the desired output format in parameter as
. In the example below setting as = "df"
returns the result as a data.frame
in @; key
-format. The two methods on how to define the positions of the levels are interchangable and lead to the same hierarchy as shown below:
v1 <- hier_compute( inp = geo_m, dim_spec = c(2, 3, 5), root = "Tot", method = "endpos", as = "df" ) v2 <- hier_compute( inp = geo_m, dim_spec = c(2, 1, 2), root = "Tot", method = "len", as = "df" ) identical(v1, v2) hier_display(v1)
If the total is contained in the string, let's say in the first 3 positions of the input values, the hierarchy can be computed as follows:
geo_m_with_tot <- paste0("Tot", geo_m) head(geo_m_with_tot) v3 <- hier_compute( inp = geo_m_with_tot, dim_spec = c(3, 2, 1, 2), method = "len" ); hier_display(v3)
The result is the same as v1
and v2
previously generated.
hier_compute()
can also deal with inputs that are of different length as shown in the next example.
## second example, unequal strings; overall total not included in input yae_h <- c( "1.1.1.", "1.1.2.", "1.2.1.", "1.2.2.", "1.2.3.", "1.2.4.", "1.2.5.", "1.3.1.", "1.3.2.", "1.3.3.", "1.3.4.", "1.3.5.", "1.4.1.", "1.4.2.", "1.4.3.", "1.4.4.", "1.4.5.", "1.5.", "1.6.", "1.7.", "1.8.", "1.9.", "2.", "3.") v1 <- hier_compute( inp = yae_h, dim_spec = c(2,2,2), root = "Tot", method = "len" ); hier_display(v1)
We also note that there is another way to specify the inputs in hier_compute()
. Setting argument method = "list"
allows to create a hierarchy from a given named list. In such a list, the name of a list element is interpreted as the name of the parent node of all codes of the specific list element. An example is shown below:
yae_ll <- list() yae_ll[["Total"]] <- c("1.", "2.", "3.") yae_ll[["1."]] <- paste0("1.", 1:9, ".") yae_ll[["1.1."]] <- paste0("1.1.", 1:2, ".") yae_ll[["1.2."]] <- paste0("1.2.", 1:5, ".") yae_ll[["1.3."]] <- paste0("1.3.", 1:5, ".") yae_ll[["1.4."]] <- paste0("1.4.", 1:6, ".") d <- hier_compute(inp = yae_ll, root = "Total", method = "list") hier_display(d)
Using hier_grid()
it is possible to compute all combinations of codes given several hierarchies. This is useful to build a complete table (e.g for merging purposes). The functionality of hier_grid
is shown below. First, we need to specify some hierarchies.
h1 <- hier_create("Total", nodes = LETTERS[1:3]) h1 <- hier_add(h1, root = "A", node = "a1") h1 <- hier_add(h1, root = "a1", node = "aa1") h2 <- hier_create("Total", letters[1:5]) h2 <- hier_add(h2, root = "b", node = "b1") h2 <- hier_add(h2, root = "d", node = "d1")
Note that we - on purpose - added some "bogus" codes to each h1
and h2
as codes a1
and aa1
in h1
and b1
and d1
in h2
are just identical to their respective parent categories. Applying hier_grid
is as simple as
hier_grid(h1, h2)
separating all target hierarchies with a ,
. hier_grid
then computes all combinations of codes from hierarchies h1
and h2
. Using the default options, these bogus codes are included in the output data.table
. Setting argument add_dups = FALSE
removes all rows containing such bogus codes. Setting option add_levs = TRUE
adds some columns labeled levs_v{n}
to the output data set. Each of this colum contains values which define the hierarchy level of the corresponding code given in variable v{n} in the same row in the table as shown below.
hier_grid(h1, h2, add_dups = FALSE, add_levs = TRUE)
The package also contains a shiny-based interactive app that can be started using hier_app()
. The app allows to pass as input either a character vector (that should be converted into a hierarchy) or an existing hierarchy and can be started as follows given the hierarchy previously generated using hier_compute()
:
d <- hier_app(d)
If a character vector is passed to hier_app()
, the interface allows to specify the arguments for hier_compute()
. Once a hierarchy is created, the interface changes and the tree can be dynamically changed by dragging nodes around. Futhermore, it is possible to add, remove or rename nodes. The required code to construct the current hierarchy is displayed and can be saved to disk. Furthermore, there is functionality to undo the last step as well as to export results to either the R-session or write results to a file. This is especially helpful if one wants to create for example an hrc
-file as input for $\tau$-argus. Please note that hier_app()
is able to return the modified hierarchy and not only save results to disk. In order to continue working, one may assign the result to a new object as shown in the code above.
In case you have any suggestions or improvements, please feel free to file an issue at our issue tracker or contribute to the package by filing a pull request against the master branch.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.