GaussSuppressionFromData: Cell suppression from input data containing inner cells

View source: R/GaussSuppressionFromData.R

GaussSuppressionFromDataR Documentation

Cell suppression from input data containing inner cells

Description

Aggregates are generated followed by primary suppression followed by secondary suppression by Gaussian elimination by GaussSuppression

Usage

GaussSuppressionFromData(
  data,
  dimVar = NULL,
  freqVar = NULL,
  ...,
  numVar = NULL,
  weightVar = NULL,
  charVar = NULL,
  hierarchies = NULL,
  formula = NULL,
  maxN = suppressWarnings(formals(c(primary)[[1]])$maxN),
  protectZeros = suppressWarnings(formals(c(primary)[[1]])$protectZeros),
  secondaryZeros = suppressWarnings(formals(candidates)$secondaryZeros),
  candidates = CandidatesDefault,
  primary = PrimaryDefault,
  forced = NULL,
  hidden = NULL,
  singleton = SingletonDefault,
  singletonMethod = ifelse(secondaryZeros, "anySumNOTprimary", "anySum"),
  printInc = TRUE,
  output = "publish",
  x = NULL,
  crossTable = NULL,
  preAggregate = is.null(freqVar),
  extraAggregate = preAggregate & !is.null(charVar),
  structuralEmpty = FALSE,
  extend0 = FALSE,
  spec = NULL,
  specLock = FALSE,
  freqVarNew = rev(make.unique(c(names(data), "freq")))[1],
  nUniqueVar = rev(make.unique(c(names(data), "nUnique")))[1],
  forcedInOutput = "ifNonNULL",
  unsafeInOutput = "ifForcedInOutput",
  lpPackage = NULL,
  aggregatePackage = "base",
  aggregateNA = TRUE,
  aggregateBaseOrder = FALSE,
  rowGroupsPackage = aggregatePackage
)

Arguments

data

Input data as a data frame

dimVar

The main dimensional variables and additional aggregating variables. This parameter can be useful when hierarchies and formula are unspecified.

freqVar

A single variable holding counts (name or number).

...

Further arguments to be passed to the supplied functions and to ModelMatrix (such as inputInOutput and removeEmpty).

numVar

Other numerical variables to be aggregated

weightVar

weightVar Weights (costs) to be used to order candidates for secondary suppression

charVar

Other variables possibly to be used within the supplied functions

hierarchies

List of hierarchies, which can be converted by AutoHierarchies. Thus, the variables can also be coded by "rowFactor" or "", which correspond to using the categories in the data.

formula

A model formula

maxN

Suppression parameter. Cells with frequency ⁠<= maxN⁠ are set as primary suppressed. Using the default primary function, maxN is by default set to 3. See details.

protectZeros

Suppression parameter. When TRUE, cells with zero frequency or value are set as primary suppressed. Using the default primary function, protectZeros is by default set to TRUE. See details.

secondaryZeros

Suppression parameter. When TRUE, cells with zero frequency or value are prioritized to be published so that they are not secondary suppressed. Using the default candidates function, secondaryZeros is by default set to FALSE. See details.

candidates

GaussSuppression input or a function generating it (see details) Default: CandidatesDefault

primary

GaussSuppression input or a function generating it (see details) Default: PrimaryDefault

forced

GaussSuppression input or a function generating it (see details)

hidden

GaussSuppression input or a function generating it (see details)

singleton

GaussSuppression input or a function generating it (see details) Default: SingletonDefault

singletonMethod

GaussSuppression input. The default value depends on parameter secondaryZeros which depends on candidates (see details).

printInc

GaussSuppression input

output

One of "publish" (default), "inner", "publish_inner", "publish_inner_x", "publish_x", "inner_x", "input2functions" (input to supplied functions), "inputGaussSuppression", "inputGaussSuppression_x", "outputGaussSuppression" "outputGaussSuppression_x", "primary", "secondary" and "all". Here "inner" means input data (possibly pre-aggregated) and "x" means dummy matrix (as input parameter x). All input to and output from GaussSuppression, except ..., are returned when "outputGaussSuppression_x". Excluding x and only input are also possible. The code "all" means all relevant output after all the calculations. Currently, this means the same as "publish_inner_x" extended with the matrices (or NULL) xExtraPrimary and unsafe. The former matrix is usually made by KDisclosurePrimary. This latter matrix contains the columns representing unsafe primary suppressions. In addition to x columns corresponding to unsafe in ordinary output (see parameter unsafeInOutput below), possible columns from xExtraPrimary may also be included in the unsafe matrix (see details).

x

x (modelMatrix) and crossTable can be supplied as input instead of generating it from ModelMatrix

crossTable

See above.

preAggregate

When TRUE, the data will be aggregated within the function to an appropriate level. This is defined by the dimensional variables according to dimVar, hierarchies or formula and in addition charVar.

extraAggregate

When TRUE, the data will be aggregated by the dimensional variables according to dimVar, hierarchies or formula. The aggregated data and the corresponding x-matrix will only be used as input to the singleton function and GaussSuppression. This extra aggregation is useful when parameter charVar is used. Supply "publish_inner", "publish_inner_x", "publish_x" or "inner_x" as output to obtain extra aggregated results. Supply "inner" or "input2functions" to obtain other results.

structuralEmpty

When TRUE, output cells with no contributing inner cells (only zeros in column of x) are forced to be not primary suppressed. Thus, these cells are considered as structural zeros. When structuralEmpty is TRUE, the following error message is avoided: Suppressed cells with empty input will not be protected. Extend input data with ⁠zeros?⁠. When removeEmpty is TRUE (see "..." below), structuralEmpty is superfluous

extend0

Data is automatically extended by Extend0 when TRUE. Can also be set to "all" which means that input codes in hierarchies are considered in addition to those in data. Parameter extend0 can also be specified as a list meaning parameter varGroups to Extend0.

spec

NULL or a named list of arguments that will act as default values.

specLock

When TRUE, arguments in spec cannot be changed.

freqVarNew

Name of new frequency variable generated when input freqVar is NULL and preAggregate is TRUE. Default is "freq" provided this is not found in names(data).

nUniqueVar

Name of variable holding the number of unique contributors. This variable will be generated in the extraAggregate step. Default is "nUnique" provided this is not found in names(data). If an existing variable is passed as input, this variable will apply only when preAggregate/extraAggregate is not done.

forcedInOutput

Whether to include forced as an output column. One of "ifNonNULL" (default), "always", "ifany" and "no". In addition, TRUE and FALSE are allowed as alternatives to "always" and "no".

unsafeInOutput

Whether to include usafe as an output column. One of "ifForcedInOutput" (default), "always", "ifany" and "no". In addition, TRUE and FALSE are allowed as alternatives to "always" and "no". see details.

lpPackage
  • lpPackage: When non-NULL, intervals by ComputeIntervals will be included in the output. See its documentation for valid parameter values for 'lpPackage'. If, additionally, at least one of the two RangeLimitsDefault parameters below is specified, further suppression will be performed to satisfy the interval width requirements. Then, the values in the output variable suppressed_integer means: no suppression (0), primary suppression (1), secondary suppression (2), additional suppression applied by an interval algorithm limited to linearly independent cells (3), and further suppression according to the final gauss algorithm (4). Intervals, ⁠[lo_1, up_1]⁠, are intervals calculated prior to additional suppression.

    • rangePercent: Required interval width expressed as a percentage

    • rangeMin: Minimum required width of the interval

               Please note that interval calculations may have a 
               different interface in future versions.
      
aggregatePackage

Package used to preAggregate/extraAggregate. Parameter pkg to aggregate_by_pkg.

aggregateNA

Whether to include NAs in the grouping variables while preAggregate/extraAggregate. Parameter include_na to aggregate_by_pkg.

aggregateBaseOrder

Parameter base_order to aggregate_by_pkg, used when preAggregate/extraAggregate. The parameter does not affect the ordering of ordinary output. Therefore, the default is set to FALSE to avoid unnecessary sorting operations. The parameter will have impact when, e.g output = "inner".

rowGroupsPackage

Parameter pkg to RowGroups. The parameter is input to Formula2ModelMatrix via ModelMatrix.

Details

The supplied functions for generating GaussSuppression input takes the following arguments: crossTable, x, freq, num, weight, maxN, protectZeros, secondaryZeros, data, freqVar, numVar, weightVar, charVar, dimVar aggregatePackage, aggregateNA, aggregateBaseOrder, rowGroupsPackage, and .... where the two first are ModelMatrix outputs (modelMatrix renamed to x). The vector, freq, is aggregated counts (t(x) %*% data[[freqVar]]). In addition, the supplied singleton function also takes nUniqueVar and (output from) primary as input.

Similarly, num, is a data frame of aggregated numerical variables. It is possible to supply several primary functions joined by c, e.g. (c(FunPrim1, FunPrim2)). All NAs returned from any of the functions force the corresponding cells not to be primary suppressed.

The effect of maxN , protectZeros and secondaryZeros depends on the supplied functions where these parameters are used. Their default values are inherited from the default values of the first primary function (several possible) or, in the case of secondaryZeros, the candidates function. When defaults cannot be inherited, they are set to NULL. In practice the function formals are still used to generate the defaults when primary and/or candidates are not functions. Then NULL is correctly returned, but suppressWarnings are needed.

Singleton handling can be turned off by singleton = NULL or singletonMethod = "none". Both of these choices are identical in the sense that singletonMethod is set to "none" whenever singleton is NULL and vice versa.

Information about uncertain primary suppressions due to forced cells can be found as described by parameters unsafeInOutput and output (⁠= "all"⁠). When forced cells affect singleton problems, this is not implemented. Some information can be seen from warnings. This can also be seen by choosing output = "secondary" together with unsafeInOutput = "ifany" or unsafeInOutput = "always". Then, negative indices from GaussSuppression using unsafeAsNegative = TRUE will be included in the output. Singleton problems may, however, be present even if it cannot be seen as warning/output. In some cases, the problems can be detected by GaussSuppressDec.

In some cases, cells that are forced, hidden, or primary suppressed can overlap. For these situations, forced has precedence over hidden and primary. That is, if a cell is both forced and hidden, it will be treated as a forced cell and thus published. Similarly, any primary suppression of a forced cell will be ignored (see parameter whenPrimaryForced to GaussSuppression). It is, however, meaningful to combine primary and hidden. Such cells will be protected while also being assigned the NA value in the suppressed output variable.

Value

Aggregated data with suppression information

Author(s)

Øyvind Langsrud and Daniel Lupp

Examples


z1 <- SSBtoolsData("z1")
GaussSuppressionFromData(z1, 1:2, 3)

z2 <- SSBtoolsData("z2")
GaussSuppressionFromData(z2, 1:4, 5, protectZeros = FALSE)


# Data as in GaussSuppression examples
df <- data.frame(values = c(1, 1, 1, 5, 5, 9, 9, 9, 9, 9, 0, 0, 0, 7, 7), 
                 var1 = rep(1:3, each = 5), var2 = c("A", "B", "C", "D", "E"))

GaussSuppressionFromData(df, c("var1", "var2"), "values")
GaussSuppressionFromData(df, c("var1", "var2"), "values", formula = ~var1 + var2, maxN = 10)
GaussSuppressionFromData(df, c("var1", "var2"), "values", formula = ~var1 + var2, maxN = 10,
      protectZeros = TRUE, # Parameter needed by SingletonDefault and default not in primary  
      primary = function(freq, crossTable, maxN, ...) 
                   which(freq <= maxN & crossTable[[2]] != "A" & crossTable[, 2] != "C"))
                   
# Combining several primary functions 
# Note that NA & c(TRUE, FALSE) equals c(NA, FALSE)                      
GaussSuppressionFromData(df, c("var1", "var2"), "values", formula = ~var1 + var2, maxN = 10, 
       primary = c(function(freq, maxN, protectZeros = TRUE, ...) freq >= 45,
                   function(freq, maxN, ...) freq <= maxN,
                   function(crossTable, ...) NA & crossTable[[2]] == "C",  
                   function(crossTable, ...) NA & crossTable[[1]]== "Total" 
                                                & crossTable[[2]]== "Total"))                    
                   
# Similar to GaussSuppression examples
GaussSuppressionFromData(df, c("var1", "var2"), "values", formula = ~var1 * var2, 
       candidates = NULL, singleton = NULL, protectZeros = FALSE, secondaryZeros = TRUE)
GaussSuppressionFromData(df, c("var1", "var2"), "values", formula = ~var1 * var2, 
       singleton = NULL, protectZeros = FALSE, secondaryZeros = FALSE)
GaussSuppressionFromData(df, c("var1", "var2"), "values", formula = ~var1 * var2, 
       protectZeros = FALSE, secondaryZeros = FALSE)

              
# Examples with zeros as singletons
z <- data.frame(row = rep(1:3, each = 3), col = 1:3, freq = c(0, 2, 5, 0, 0, 6:9))
GaussSuppressionFromData(z, 1:2, 3, singleton = NULL) 
GaussSuppressionFromData(z, 1:2, 3, singletonMethod = "none") # as above 
GaussSuppressionFromData(z, 1:2, 3)
GaussSuppressionFromData(z, 1:2, 3, protectZeros = FALSE, secondaryZeros = TRUE, singleton = NULL)
GaussSuppressionFromData(z, 1:2, 3, protectZeros = FALSE, secondaryZeros = TRUE)      

GaussSuppression documentation built on Sept. 24, 2024, 5:07 p.m.