SuppressDominantCells: Suppress magnitude tables using dominance (n,k) or p% rule...

View source: R/SuppressDominantCells.R

SuppressDominantCellsR Documentation

Suppress magnitude tables using dominance ⁠(n,k)⁠ or p% rule for primary suppression.

Description

This function utilizes MagnitudeRule.

Usage

SuppressDominantCells(
  data,
  n = 1:length(k),
  k = NULL,
  pPercent = NULL,
  allDominance = FALSE,
  dominanceVar = NULL,
  numVar = NULL,
  dimVar = NULL,
  hierarchies = NULL,
  formula = NULL,
  contributorVar = NULL,
  sWeightVar = NULL,
  ...,
  candidatesVar = NULL,
  singletonZeros = FALSE,
  preAggregate = !is.null(contributorVar) & is.null(sWeightVar),
  spec = PackageSpecs("dominanceSpec")
)

Arguments

data

Input data, typically a data frame, tibble, or data.table. If data is not a classic data frame, it will be coerced to one internally unless preAggregate is TRUE and aggregatePackage is "data.table".

n

Parameter n in dominance rule. Default is 1:length(k).

k

Parameter k in dominance rule.

pPercent

Parameter in the p% rule, when non-NULL. Parameters n and k will then be ignored. Technically, calculations are performed internally as if n = 1:2. The results of these intermediate calculations can be viewed by setting allDominance = TRUE.

allDominance

Logical. If TRUE, additional information is included in the output, as described in MagnitudeRule.

dominanceVar

Numerical variable to be used in dominance rule. The first numVar variable will be used if it is not specified.

numVar

Numerical variable to be aggregated. Any dominanceVar and candidatesVar that are specified and not included in numVar will be aggregated accordingly.

dimVar

The main dimensional variables and additional aggregating variables. This parameter can be useful when hierarchies and formula are unspecified.

hierarchies

List of hierarchies, which can be converted by AutoHierarchies. Thus, the variables can also be coded by "rowFactor" or "", which correspond to using the categories in the data.

formula

A model formula

contributorVar

Extra variables to be used as grouping elements in the dominance rule. Typically, the variable contains the contributor IDs.

sWeightVar

Name of variable which represents sampling weights to be used in dominance rule

...

Further arguments to be passed to the supplied functions and to ModelMatrix (such as inputInOutput and removeEmpty).

candidatesVar

Variable to be used in the candidate function to prioritize cells for publication and thus not suppression. If not specified, the same variable that is used for the dominance rule will be applied (see dominanceVar and numVar).

singletonZeros

When negative values cannot occur, one can determine from a non-suppressed marginal cell with the value 0 that all underlying cells also have the value 0. The use of singletonZeros = TRUE is intended to prevent this phenomenon from causing suppressed cells to be revealable. It is the zeros in the dominanceVar variable that are examined. Specifically, the ordinary singleton method is combined with a method that is actually designed for frequency tables. This approach also works for magnitude tables when SingletonUniqueContributor0 is utilized.

preAggregate

Parameter to GaussSuppressionFromData. Necessary to include here since the specification in spec cannot take sWeightVar into account.

spec

NULL or a named list of arguments that will act as default values.

Value

data frame containing aggregated data and suppression information.

See Also

SSBtools::tables_by_formulas()

Examples

num <- c(100,
         90, 10,
         80, 20,
         70, 30,
         50, 25, 25,
         40, 20, 20, 20,
         25, 25, 25, 25)
v1 <- c("v1",
        rep(c("v2", "v3", "v4"), each = 2),
        rep("v5", 3),
        rep(c("v6", "v7"), each = 4))
sweight <- c(1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1)
d <- data.frame(v1 = v1, num = num, sweight = sweight)

# basic use
SuppressDominantCells(d, n = c(1,2), k = c(80,70), numVar = "num", formula = ~v1 -1)
SuppressDominantCells(d, k = c(80,70), numVar = "num", formula = ~v1 -1) # same as above
SuppressDominantCells(d, pPercent = 7, numVar = "num", formula = ~v1 -1) 

# with weights
SuppressDominantCells(d, n = c(1,2), k = c(80,70), numVar = "num",
dimVar = "v1", sWeightVar = "sweight")

# overwriting some parameters in default spec
SuppressDominantCells(d, n = c(1,2), k = c(80,70), numVar = "num",
dimVar = "v1", sWeightVar = "sweight", domWeightMethod = "tauargus")

# using dominance and few contributors rule together, see second example compared to first
SuppressDominantCells(d, n = c(1,2), k = c(80,70), numVar = "num", formula = ~v1 -1,
primary = c(DominanceRule, NContributorsRule), maxN = 3, allDominance = TRUE)

SuppressDominantCells(d, n = c(1,2), k = c(80,70), numVar = "num", formula = ~v1 -1,
primary = c(DominanceRule, NContributorsRule), maxN = 4, allDominance = TRUE)


d2 <- SSBtoolsData("d2")[1:4]   # Data considered as microdata
set.seed(123)
d2$v <- rnorm(nrow(d2))^2

# Hierarchical region variables are detected automatically -> same output column
SuppressDominantCells(data = d2, n = c(1, 2), k = c(70, 95), numVar = "v", 
                      dimVar = c("region", "county", "k_group"), allDominance = TRUE)

# Formula. Hierarchical variables still detected automatically.
SuppressDominantCells(data = d2, n = c(1, 2), k = c(70, 95), numVar = "v", 
                      formula = ~main_income * k_group + region + county - k_group)

# With hierarchies created manually
ml <- data.frame(levels = c("@", "@@", "@@@", "@@@", "@@@", "@@"), 
        codes = c("Total", "not_assistance", "other", "pensions", "wages", "assistance"))
SuppressDominantCells(data = d2, n = c(1, 2), k = c(70, 95), numVar = "v", 
                      hierarchies = list(main_income = ml, k_group = "Total_Norway"))
                      
# With contributorVar and p% rule    
SuppressDominantCells(data= SSBtoolsData("magnitude1"), 
                      numVar = "value", 
                      dimVar= c("sector4", "geo"), 
                      contributorVar = "company",
                      pPercent = 10, 
                      allDominance = TRUE)                       
                      
                      
# Using formula followed by FormulaSelection                        
output <- SuppressDominantCells(data = SSBtoolsData("magnitude1"), 
                                numVar = "value", 
                                formula = ~sector2 * geo + sector4 * eu, 
                                contributorVar = "company", 
                                k = c(80, 99))
FormulaSelection(output, ~sector2 * geo) 
                      
                      
# This example is similar to the one in the documentation of tables_by_formulas,  
# but it uses SuppressDominantCells with the pPercent and contributorVar parameters.  
tables_by_formulas(SSBtoolsData("magnitude1"),
                   table_fun = SuppressDominantCells, 
                   table_formulas = list(table_1 = ~region * sector2, 
                                         table_2 = ~region1:sector4 - 1, 
                                         table_3 = ~region + sector4 - 1), 
                   substitute_vars = list(region = c("geo", "eu"), region1 = "eu"), 
                   collapse_vars = list(sector = c("sector2", "sector4")), 
                   dominanceVar  = "value", pPercent = 10, contributorVar = "company")                       
                      
                      

GaussSuppression documentation built on April 3, 2025, 7:54 p.m.