options(rmarkdown.html_vignette.check_title = FALSE) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
htmltables <- TRUE if (htmltables) { source("GaussKable.R") source("KableMagnitudeTable.R") P <- function(...) G(timevar = "geo", ...) M <- function(...) KableMagnitudeTable(..., numVar = "value", timevar = "geo", singletonMethod = "none") } else { P <- function(...) cat("Formatted table not avalable") M <- P }
The GaussSuppression
package contains several easy-to-use wrapper functions and in this vignette we will look at
the SuppressFewContributors
and SuppressDominantCells
functions.
In these functions, primary suppression is based on the number of contributors or by dominance rules.
Then, as always in this package, secondary suppression is performed using the Gauss method.
We begin by loading a dataset to be used below.
library(GaussSuppression) dataset <- SSBtoolsData("magnitude1") dataset
We can imagine the figures in the variable "value"
represent sales value to different sectors from different companies.
In the first examples, we will not use the "company"
variable,
but instead assume that each row represents the contribution of a unique company.
Our input data can then be reformatted and illustrated like this:
\
M(caption = '**Table 1**: Input data with the 20 contributions.', dataset, formula = ~sector4:geo-1)
\
In the first example, we use SuppressFewContributors
with maxN = 1
.
This means that cells based on a single contributor are primary suppressed.
SuppressFewContributors(data=dataset, numVar = "value", dimVar= c("sector4", "geo"), maxN=1)
In the output, the number of contributors is in columns nRule
and nAll
.
The two columns are equal under normal usage.
A formatted version of this output is given in Table 2 below. Primary suppressed cells are underlined and labeled in red, while the secondary suppressed cells are labeled in purple.
\
P(caption = '**Table 2**: Output from `SuppressFewContributors` with `maxN = 1` (number of contributors in parenthesis)', data=dataset, numVar = "value", dimVar= c("sector4", "geo"), maxN = 1, fun = SuppressFewContributors, print_expr = 'paste0(value, " (",nAll ,") ")')
\
In the second example, we use SuppressDominantCells
with n = 1
and k = 80
.
This means that aggregates are primary suppressed whenever the
largest contribution exceeds 80% of the cell total.
SuppressDominantCells(data=dataset, numVar = "value", dimVar= c("sector4", "geo"), n = 1, k = 80, allDominance = TRUE)
To incorporate the percentage of the two largest contributions in the output, the parameter allDominance = TRUE
was utilized.
A formatted version of this output is given in Table 3 below.
\
P(caption = '**Table 3**: Output from `SuppressDominantCells` <br> with `n = 1` and `k = 80` <br> (percentage from largest contribution in parenthesis)', data=dataset, numVar = "value", dimVar= c("sector4", "geo"), n=1, k=80, allDominance = TRUE, fun = SuppressDominantCells, print_expr = 'paste0(value, " (",round(100*`primary.1:80`) ,"%) ")')
Note that this table, as well as Table 2, is discussed below in the section on the singleton problem. \ \
Here we use SuppressDominantCells
with n = 1:2
and k = c(80, 99)
.
This means that aggregates are primary suppressed whenever
the largest contribution exceeds 80% of the cell total
or when the two largest contributions exceed 99% of the cell total.
In addition, the example below is made even more advanced by including the variables "sector2" and "eu".
output <- SuppressDominantCells(data=dataset, numVar = "value", dimVar= c("sector4", "sector2", "geo", "eu"), n = 1:2, k = c(80, 99)) head(output)
\
P(caption = '**Table 4**: Output from `SuppressDominantCells` <br> with `n = 1:2` and `k = c(80, 99)`', data=dataset, numVar = "value", dimVar= c("sector4", "sector2", "geo", "eu"), n = 1:2, k = c(80, 99), fun = SuppressDominantCells, print_expr = 'value')
\
As described in the define-tables vignette hierarchies are here detected automatically. The same output is obtained if we first generate hierarchies by:
dimlists <- SSBtools::FindDimLists(dataset[c("sector4", "sector2", "geo", "eu")]) dimlists
And thereafter run SuppressDominantCells with these hierarchies as input:
output <- SuppressDominantCells(data=dataset, numVar = "value", hierarchies = dimlists, n = 1:2, k = c(80, 99))
\ \
Using the formula interface is one way to achieve fewer cells in the output.
Below we use SuppressFewContributors
with maxN = 2
.
This means that table cells based on one or two contributors are primary suppressed.
output <- SuppressFewContributors(data=dataset, numVar = "value", formula = ~sector2*geo + sector4*eu, maxN=2, removeEmpty = FALSE) head(output) tail(output)
In the formatted version of this output, blank cells indicate that they are not included in the output.
\
P(caption = '**Table 5**: Output from `SuppressFewContributors` with `maxN = 2` <br> (number of contributors in parenthesis)', data=dataset, numVar = "value", formula = ~sector2*geo + sector4*eu, maxN=2, removeEmpty = FALSE, fun = SuppressFewContributors, print_expr = 'paste0(value, " (",nAll ,") ")')
\
Please note that in order to include the three empty cells with no contributors,
the removeEmpty
parameter was set to FALSE
. By default, this parameter is set to TRUE
when using the formula interface.
\
\
contributorVar = "company"
)According to the "company"
variable in the data set,
there are only four contribution companies (A, B, C and D).
We specify this using the contributorVar
parameter, which corresponds to a variable within the dataset, in this case, "company"
.
In general, this variable refers to the holding information to be used by the suppression method.
When this is taken into account, the primary suppression rules will be applied to data that,
within each cell, is aggregated within each contributor.
Our example data aggregated in this way is shown below.
\
M(caption = '**Table 6**: The "value" data aggregated according to hierarchy and contributor', dataset, dimVar = c("sector4", "sector2", "geo", "eu"), contributorVar = "company")
\
Below we take into account contributor IDs when using few contributors primary suppression.
output <- SuppressFewContributors(data=dataset, numVar = "value", dimVar = c("sector4", "sector2", "geo", "eu"), maxN=2, contributorVar = "company") head(output)
\
P(caption = '**Table 7**: Output from `SuppressFewContributors` with `maxN = 2` and with `contributorVar = "company"` (number contributors in parenthesis)', data=dataset, numVar = "value", dimVar = c("sector4", "sector2", "geo", "eu"), maxN=2, contributorVar = "company", fun = SuppressFewContributors, print_expr = 'paste0(value, " (",nAll ,") ")')
\
Below we take into account contributor IDs when using dominant cell primary suppression.
output <- SuppressDominantCells(data=dataset, numVar = "value", formula = ~sector2*geo + sector4*eu, contributorVar = "company", n = 1:2, k = c(80, 99)) head(output)
Here we have also made use of the formula interface.
\
P(caption = '**Table 8**: Output from `SuppressDominantCells` with `n = 1:2` and <br> `k = c(80, 99)` and with `contributorVar = "company"`', data=dataset, numVar = "value", formula = ~sector2*geo + sector4*eu, contributorVar = "company", n = 1:2, k = c(80, 99), fun = SuppressDominantCells, print_expr = 'value')
\ \
Below, the data is suppressed in the same way as in Table 7, but with a different formula. \
output <- SuppressDominantCells(data=dataset, numVar = "value", formula = ~sector4*geo + sector2*eu, contributorVar = "company", n = 1:2, k = c(80, 99)) head(output)
\
P(caption = '**Table 9**: Output from `SuppressDominantCells` with `n = 1:2` and `k = c(80, 99)` and with `contributorVar = "company"`', data=dataset, numVar = "value", formula = ~sector4*geo + sector2*eu, contributorVar = "company", n = 1:2, k = c(80, 99), fun = SuppressDominantCells, print_expr = 'value')
\
By using singletonMethod = "none"
in this case, Entertainment:Spain will not be suppressed.
This cell is suppressed due to the default handling of the singleton problem.
The reason is that Entertainment:Iceland has a single contributor. This contributor can reveal Entertainment:Portugal if Entertainment:Spain is not suppressed.
Here it might appear that the table contains another issue, Entertainment:Iceland can reveal Industry:Iceland. However, this can be considered ok in this case. Industry:Iceland is secondary suppressed and the only reason for this is to protect Entertainment:Iceland.
Nevertheless, in most cases, secondary suppressed cells introduce further complexity to the handling of singletons. A part of the singleton handling of magnitude tables is to add virtual primary suppressed cells prior to the secondary suppression algorithm. Secondary suppressed cells cannot, therefore, be treated in this way. However, another part of the singleton handling solves many of the remaining problems. This is done within the suppression algorithm.
Here we can observe the effect of this in Tables 2 and 3.
By using singletonMethod = "none"
in Table 2, Industry:Portugal will not be suppressed.
In that case, Industry:Spain can reveal Industry:Iceland and consequently Entertainment:Iceland.
In Table 3, Governmental:Total and Industry:Total are suppressed due to advanced singleton handling.
In fact, by using singletonMethod = "none"
, all tables above will be suppressed differently.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.