Linked table suppression"

options(rmarkdown.html_vignette.check_title = FALSE)
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
htmltables = TRUE
if(htmltables){
  source("GaussKable.R")
  P = function(..., timevar= "geo", fun = SuppressSmallCounts) G(fun = fun, timevar = timevar, ...)
} else { 
  P = function(...) cat("Formatted table not avalable")
}

SuppressSmallCounts1 <- function(withinArg, item, formula, ...){
  SuppressLinkedTables(..., fun = SuppressSmallCounts, withinArg = withinArg)[[item]]
}

Introduction

By using the formula parameter, it is already possible to protect linked tables with the functions described in the other vignettes. The result is the strictest form of protection, which we call global protection.

This vignette illustrates alternative methods for linked tables. A common method for such protection is back-tracking where one iterates until a consistent solution is found. In the functions described below, such a method can be achieved by specifying linkedGauss = "back-tracking". With the GaussSuppression package, one can find such a consistent solution using an improved approach, avoiding the need for iteration.

Below we start with some examples of protected tables with alternative methods. Then we show in more detail different function calls that achieve this. We also discuss the parameters recordAware and collapseAware.

Finally, an example with interval protection is also shown.

Input data and examples

We use a modified version of the example 1 dataset used elsewhere.

library(GaussSuppression)
dataset <- SSBtoolsData("example1")
dataset <- dataset[c(1, 2, 4, 6, 8, 10, 12, 13, 14, 15), ]
dataset$freq = c(6, 8, 9, 1, 2, 4, 3, 7, 2, 2)
print(dataset)

In the examples, we work with two linked tables:
- a three-way table where age, eu, and year are crossed, and
- a two-way table where geo and year are crossed.

In this example, small counts (1s and 2s) are protected. All zeros are treated as known structural zeros and are omitted from both the input and the output.

As in the other vignettes, primary suppressed cells are underlined and labeled in red, while the secondary suppressed cells are labeled in purple.

We first illustrate local protection, where the tables are protected separately without any coordination between them.

f1 <- ~age*eu*year
f2 <- ~geo*year

\

Table 1: Linked suppressed tables by
linkedGauss = "local"

P(data = dataset, formula = f1, freqVar = "freq", maxN = 2, extend0 = FALSE, timevar = "eu")
P(data = dataset, formula = f2, freqVar = "freq", maxN = 2, extend0 = FALSE,timevar = "geo")

\

Clearly, this is not a satisfactory solution. The totals for 2015 and 2016 are suppressed in one table, but not in the other. Furthermore, there is also an inconsistency for Iceland-2014, which is the same as nonEU-2014.

We continue with consistent protection.

\

Table 2: Linked suppressed tables by
linkedGauss = "consistent"

P(data = dataset, fun = SuppressSmallCounts1,
  withinArg = list(list(formula = f1), list(formula = f2)), 
  freqVar = "freq", maxN = 2, extend0 = FALSE, linkedGauss = "consistent",
  formula = f1, item = 1, timevar = "eu")
P(data = dataset, fun = SuppressSmallCounts1,
  withinArg = list(list(formula = f1), list(formula = f2)), 
  freqVar = "freq", maxN = 2, extend0 = FALSE, linkedGauss = "consistent",
  formula = f2, item = 2, timevar = "geo")

\

The inconsistency problems are now avoided.

However, a remaining problem with this solution is that Spain-2015 can be derived from EU-2015 and Portugal-2015.

Finally, we illustrate an improved form of consistent protection, denoted as super-consistent, which also avoids this problem.

\

Table 3: Linked suppressed tables by
linkedGauss = "super-consistent"

P(data = dataset, fun = SuppressSmallCounts1,
  withinArg = list(list(formula = f1), list(formula = f2)), 
  freqVar = "freq", maxN = 2, extend0 = FALSE, linkedGauss = "super-consistent",
  formula = f1, item = 1, timevar = "eu")
P(data = dataset, fun = SuppressSmallCounts1,
  withinArg = list(list(formula = f1), list(formula = f2)), 
  freqVar = "freq", maxN = 2, extend0 = FALSE, linkedGauss = "super-consistent",
  formula = f2, item = 2, timevar = "geo")

\

The suppressed cells in each table correspond to related equations that cannot be solved. The super-consistent method makes use of the fact that common cells across tables must have the same value. Thus, the equations from the different tables can be combined when searching for solutions. The super-consistent method ensures that suppressed cells cannot be uniquely determined from the combined system of equations. However, the coordination is not as strict as in the global method, where the system of equations becomes even larger. In this particular case, the super-consistent solution turns out to be the same as the global one.

Function calls and output

To achieve both treating zeros as known structural zeros and omitting them from the output, we use the parameter settings extend0 = FALSE and removeEmpty = TRUE.

In SuppressLinkedTables(), the argument withinArg specifies which parameters may differ between the linked tables. In our examples, we choose this to be either dimVar, hierarchies, or formula.

The output from SuppressLinkedTables() is a list, with one element for each of the linked tables.

SuppressLinkedTables() with dimVar

output <- SuppressLinkedTables(data = dataset,
              fun = SuppressSmallCounts, 
              withinArg = list(table_1 = list(dimVar = c("age", "eu", "year")), 
                               table_2 = list(dimVar = c("geo", "year"))),
              freqVar = "freq", 
              maxN = 2,
              extend0 = FALSE, 
              removeEmpty = TRUE,
              linkedGauss = "super-consistent")

print(output[["table_1"]])
print(output[["table_2"]])

SuppressLinkedTables() with hierarchies

First, we need hierarchies for the input. Here, these are generated separately with SSBtools::FindDimLists().

h_age  <- SSBtools::FindDimLists(dataset["age"])[[1]]
h_geo  <- SSBtools::FindDimLists(dataset["geo"])[[1]]
h_eu   <- SSBtools::FindDimLists(dataset["eu"])[[1]]
h_year <- SSBtools::FindDimLists(dataset["year"])[[1]]

print(h_age)
print(h_geo)
print(h_eu)
print(h_year)

The output is identical to using dimVar, so we only show the code. Note that the only difference is the withinArg argument.

output <- SuppressLinkedTables(data = dataset,
              fun = SuppressSmallCounts, 
              withinArg = 
                list(table_1 = list(hierarchies = list(age = h_age, eu = h_eu, year = h_year)), 
                     table_2 = list(hierarchies = list(geo = h_geo, year = h_year))),
              freqVar = "freq", 
              maxN = 2,
              extend0 = FALSE, 
              removeEmpty = TRUE,
              linkedGauss = "super-consistent")

SuppressLinkedTables() with formula

When using formula, the output is similar to that obtained with dimVar or hierarchies. The only difference in the output is the ordering of rows, so we only show the code.

Again, the only difference in the code is the withinArg argument. However, note that we have omitted removeEmpty = TRUE here, since this is the default when a formula is used as input.

output <- SuppressLinkedTables(data = dataset,
              fun = SuppressSmallCounts, 
              withinArg = list(table_1 = list(formula = ~age*eu*year), 
                               table_2 = list(formula = ~geo*year)),
              freqVar = "freq", 
              maxN = 2,
              extend0 = FALSE,
              linkedGauss = "super-consistent")

SuppressSmallCounts() with formula and linkedGauss

Since only the formula parameter varies between the linked tables, one option is to run SuppressSmallCounts() directly with formula as a list and the linkedGauss parameter specified. Here we show 10 output rows.

output <- SuppressSmallCounts(data = dataset,
              formula = list(table_1 = ~age*eu*year, table_2 = ~geo*year),   
              freqVar = "freq",
              maxN = 2,
              extend0 = FALSE,
              linkedGauss = "super-consistent") 
print(output[c(1, 6:7, 12, 19, 23, 25:28), ])

tables_by_formulas() with formula and linkedGauss

Similar output can be obtained by tables_by_formulas(). In this case, the region variable is specified manually, and table membership variables are included in the output. Again, 10 output rows are shown.

output <-  tables_by_formulas(data = dataset,
              table_fun = SuppressSmallCounts,                
              table_formulas = list(table_1 = ~age*eu*year, table_2 = ~geo*year),   
              freqVar = "freq",
              maxN = 2,
              extend0 = FALSE,
              linkedGauss = "super-consistent",
              substitute_vars = list(region = c("geo", "eu"))) 

print(output[c(1, 6:7, 12, 19, 23, 25:28), ])

The parameters recordAware and collapseAware

An important issue is which cells are considered common cells. In the functions, the parameter recordAware is set to TRUE by default. In this case, common cells are determined based on whether they aggregate the same underlying records. This is similar to the use of cell keys, a well-known concept from the cell-key method of statistical disclosure control.

When recordAware = FALSE, common cells are instead identified by matching variable combinations. This does not always work well. For example, here recordAware = TRUE is necessary to capture that Iceland-2014 and nonEU-2014 are the same.

A related parameter is collapseAware, but it is not available when using SuppressLinkedTables(). When it is used, even more cells are treated as common cells. In particular, the suppression algorithm then automatically accounts for cells in one table that are sums of cells in another table. In our example, this means that the combination "consistent" and collapseAware = TRUE gives the same result as "super-consistent".

For more details on parameters and options, see the documentation for SuppressLinkedTables().

Interval protection

Intervals for the primary suppressed cells are computed whenever the lpPackage parameter is specified. When linkedGauss = "super-consistent", intervals can be calculated using this method as well.

There are several possibilities. See the documentation for the parameter linkedIntervals in the help page for SuppressLinkedTables().

If rangePercent and/or rangeMin are provided, further suppression is performed to ensure that the interval width requirements are met. See the help page for GaussSuppressionFromData(), under the description of the lpPackage parameter, for more details.

In the example below, the required interval width is 4. To achieve this, two additional cells are suppressed: Portugal-2015 and Spain-2015. Without this additional suppression, some intervals are as narrow as 3 (see variables lo_1 and up_1 below).

lpPackage <- "highs" 
if (!requireNamespace(lpPackage, quietly = TRUE)) {
  cat(paste0("Note: The final part of this vignette requires the suggested package '", lpPackage, "' which is not installed. That part has been skipped.\n"))
  knitr::knit_exit()
}

\

Table 4: Linked suppressed tables with intervals by
linkedGauss = "super-consistent", rangeMin = 4

P(data = dataset, fun = SuppressSmallCounts1,
  withinArg = list(list(formula = f1), list(formula = f2)), 
  freqVar = "freq", maxN = 2, extend0 = FALSE, linkedGauss = "super-consistent",
  lpPackage = "highs", rangeMin = 4,
  formula = f1, item = 1, timevar = "eu",
  print_expr = 'ifelse(is.na(lo), freq, paste0(freq, " [", lo, ", ", up, "]"))')
P(data = dataset, fun = SuppressSmallCounts1,
  withinArg = list(list(formula = f1), list(formula = f2)), 
  freqVar = "freq", maxN = 2, extend0 = FALSE, linkedGauss = "super-consistent",
  lpPackage = "highs", rangeMin = 4,
  formula = f2, item = 2, timevar = "geo",
  print_expr = 'ifelse(is.na(lo), freq, paste0(freq, " [", lo, ", ", up, "]"))')

\

This functionality can be used with all the function calls above. Below is shown SuppressLinkedTables() with dimVar.

output <- SuppressLinkedTables(data = dataset,
                               fun = SuppressSmallCounts, 
                               withinArg = list(table_1 = list(dimVar = c("age", "eu", "year")), 
                                                table_2 = list(dimVar = c("geo", "year"))),
                               freqVar = "freq", 
                               maxN = 2,
                               extend0 = FALSE, 
                               removeEmpty = TRUE,
                               linkedGauss = "super-consistent",
                               lpPackage = "highs", 
                               rangeMin = 4)

print(output[["table_1"]])
print(output[["table_2"]])


Try the GaussSuppression package in your browser

Any scripts or data that you put into this service are public.

GaussSuppression documentation built on Aug. 25, 2025, 5:12 p.m.