SuppressFewContributors: Few contributors suppression

View source: R/SuppressFewContributors.R

SuppressFewContributorsR Documentation

Few contributors suppression

Description

This function provides functionality for suppressing volume tables based on the few contributors rule (NContributorsRule).

Usage

SuppressFewContributors(
  data,
  maxN,
  numVar = NULL,
  dimVar = NULL,
  hierarchies = NULL,
  formula = NULL,
  contributorVar = NULL,
  removeCodes = character(0),
  remove0 = TRUE,
  candidatesVar = NULL,
  ...,
  spec = PackageSpecs("fewContributorsSpec")
)

Arguments

data

Input data as a data frame

maxN

Suppression parameter. Cells with frequency ⁠<= maxN⁠ are set as primary suppressed. Using the default primary function, maxN is by default set to 3. See details.

numVar

Numerical variable to be aggregated. Any candidatesVar that is specified and not included in numVar will be aggregated accordingly. Additionally, if remove0 is specified as a variable name and it is not included in numVar, it will also be aggregated accordingly. See parameters candidatesVar and remove0 below.

dimVar

The main dimensional variables and additional aggregating variables. This parameter can be useful when hierarchies and formula are unspecified.

hierarchies

List of hierarchies, which can be converted by AutoHierarchies. Thus, the variables can also be coded by "rowFactor" or "", which correspond to using the categories in the data.

formula

A model formula

contributorVar

Extra variables to be used as grouping elements when counting contributors. Typically, the variable contains the contributor IDs.

removeCodes

Vector of codes to be omitted when counting contributors. With empty contributorVar row indices are assumed and conversion to integer is performed.

remove0

When set to TRUE (default), data rows in which the first numVar (if any) is zero are excluded from the count of contributors. Alternatively, remove0 can be specified as one or more variable names. In this case, all data rows with a zero in any of the specified variables are omitted from the contributor count. Specifying remove0 as variable name(s) is useful for avoiding warning when there are multiple numVar variables.

candidatesVar

Variable to be used in the candidate function to prioritize cells for publication and thus not suppression. The first numVar variable will be used if it is not specified.

...

Further arguments to be passed to the supplied functions and to ModelMatrix (such as inputInOutput and removeEmpty).

spec

NULL or a named list of arguments that will act as default values.

Value

data.frame containing aggregated data and supppression information. Columns nRule and nAll contain the number of contributors. In the former, removeCodes is taken into account.

Examples

num <- c(100,
         90, 10,
         80, 20,
         70, 30,
         50, 25, 25,
         40, 20, 20, 20,
         25, 25, 25, 25)
v1 <- c("v1",
        rep(c("v2", "v3", "v4"), each = 2),
        rep("v5", 3),
        rep(c("v6", "v7"), each = 4))
sweight <- c(1, 2, 1, 2, 1, 2, 1, 2, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1)
d <- data.frame(v1 = v1, num = num, sweight = sweight)

SuppressFewContributors(d, formula = ~v1, maxN = 1, numVar = "num")
SuppressFewContributors(d, formula = ~v1, maxN = 2, numVar = "num")
SuppressFewContributors(d, formula = ~v1, maxN = 3, numVar = "num")


d2 <- SSBtoolsData("d2")[-5]
set.seed(123)
d2$v <- round(rnorm(nrow(d2))^2, 1)
d2$family_id <- round(2*as.integer(factor(d2$region)) + runif(nrow(d2)))

# Hierarchical region variables are detected automatically -> same output column
SuppressFewContributors(data = d2, maxN = 2, numVar = "v", contributorVar = "family_id",
                      dimVar = c("region", "county", "k_group"))

# Formula. Hierarchical variables still detected automatically.
# And codes 1:9 not counted 
SuppressFewContributors(data = d2, maxN = 1, numVar = "v", contributorVar = "family_id",
                      formula = ~main_income * k_group + region + county - k_group,
                      removeCodes = 1:9)

# With hierarchies created manually
ml <- data.frame(levels = c("@", "@@", "@@@", "@@@", "@@@", "@@"), 
        codes = c("Total", "not_assistance", "other", "pensions", "wages", "assistance"))
SuppressFewContributors(data = d2, maxN = 2, numVar = "v", contributorVar = "family_id",
                      hierarchies = list(main_income = ml, k_group = "Total_Norway"))
                      
                      

GaussSuppression documentation built on Sept. 24, 2024, 5:07 p.m.