RoundViaDummy: Small Count Rounding of Tabular Data
In SmallCountRounding: Small Count Rounding of Tabular Data

RoundViaDummy

R Documentation

Small Count Rounding of Tabular Data

Description

Small count rounding via a dummy matrix and by an algorithm inspired by PLS

Usage

RoundViaDummy(
  data,
  freqVar,
  formula = NULL,
  roundBase = 3,
  singleRandom = FALSE,
  crossTable = TRUE,
  total = "Total",
  maxIterRows = 1000,
  maxIter = 1e+07,
  x = NULL,
  hierarchies = NULL,
  xReturn = FALSE,
  maxRound = roundBase - 1,
  zeroCandidates = FALSE,
  forceInner = FALSE,
  identifyNew = TRUE,
  step = 0,
  preRounded = NULL,
  leverageCheck = FALSE,
  easyCheck = TRUE,
  printInc = TRUE,
  rndSeed = 123,
  dimVar = NULL,
  plsWeights = NULL,
  preDifference = NULL,
  allSmall = FALSE,
  ...
)

Arguments

`data`	Input data as a data frame (inner cells)
`freqVar`	Variable holding counts (name or number)
`formula`	Model formula defining publishable cells. Will be used to calculate `x` (via `ModelMatrix`). When NULL, x must be supplied.
`roundBase`	Rounding base
`singleRandom`	Single random draw when TRUE (instead of algorithm)
`crossTable`	When TRUE, cross table in output and caculations via FormulaSums()
`total`	String used to name totals
`maxIterRows`	See details
`maxIter`	Maximum number of iterations
`x`	Dummy matrix defining publishable cells
`hierarchies`	List of hierarchies, which can be converted by `AutoHierarchies`. Thus, a single string as hierarchy input is assumed to be a total code. Exceptions are `"rowFactor"` or `""`, which correspond to only using the categories in the data.
`xReturn`	Dummy matrix in output when TRUE (as input parameter `x`)
`maxRound`	Inner cells contributing to original publishable cells equal to or less than maxRound will be rounded.
`zeroCandidates`	When TRUE, inner cells in input with zero count (and multiple of roundBase when maxRound is in use) contributing to publishable cells will be included as candidates to obtain roundBase value. With vector input, the rule is specified individually for each cell. This can be specified as a vector, a variable in data or a function generating it (see details).
`forceInner`	When TRUE, all inner cells will be rounded. Use vector input to force individual cells to be rounded. This can be specified as a vector, a variable in data or a function generating it (see details). Can be combined with parameter zeroCandidates to allow zeros and roundBase multiples to be rounded up.
`identifyNew`	When `TRUE`, new cells may be identified after initial rounding to ensure all rounded publishable cells equal to or less than `maxRound` to be `roundBase` multiples. Use `NA` for the a less conservative behavior (old behavior). Then it is ensured that no nonzero rounded publishable cells are smaller than `roundBase`. When `maxRound` is default, there is no difference between `TRUE` and `NA`.
`step`	When `step>1`, the original forward part of the algorithm is replaced by a kind of stepwise. After `step` steps forward, backward steps may be performed. The `step` parameter is also used for backward-forward iteration at the end of the algorithm; `step` backward steps may be performed. For greater control, the `step` parameter can be specified as a vector. Additionally, it can be provided as a list to trigger a final re-run iteration. See details.
`preRounded`	A vector or a variable in data that contains a mixture of missing values and predetermined values of rounded inner cells. Can also be specified as a function generating it (see details).
`leverageCheck`	When TRUE, all inner cells that depends linearly on the published cells and with small frequencies (`<=maxRound`) will be rounded. The computation of leverages can be very time and memory consuming. The function `Reduce0exact` is called. The default leverage limit is `0.999999`. Another limit can be sent as input instead of `TRUE`. Checking is performed before and after (since new zeros) rounding. Extra iterations are performed when needed.
`easyCheck`	A light version of the above leverage checking. Checking is performed after rounding. Extra iterations are performed when needed. `Reduce0exact` is called with `reduceByLeverage=FALSE` and `reduceByColSums=TRUE`.
`printInc`	Printing iteration information to console when TRUE
`rndSeed`	If non-NULL, a random generator seed to be used locally within the function without affecting the random value stream in R.
`dimVar`	The main dimensional variables and additional aggregating variables. This parameter can be useful when hierarchies and formula are unspecified.
`plsWeights`	A vector of weights for each cell to be published or a function generating it (see details). For use in the algorithm criterion.
`preDifference`	A data.frame with differences already obtained from rounding another subset of data. There must be columns that match `crossTable`. Differences must be in the last column.
`allSmall`	When TRUE, all small inner cells (`⁠<= maxRound⁠`) are rounded. This parameter is a simplified alternative to specifying `forceInner` (see details).
`...`	Further parameters sent to `ModelMatrix`. In particular, one can specify `removeEmpty=TRUE` to omit empty combinations. The parameter `inputInOutput` can be used to specify whether to include codes from input. The parameter `avoidHierarchical` (`Formula2ModelMatrix`) can be combined with formula input.

Details

Small count rounding of necessary inner cells are performed so that all small frequencies of cross-classifications to be published (publishable cells) are rounded. This is equivalent to changing micro data since frequencies of unique combinations are changed. Thus, additivity and consistency are guaranteed. The matrix multiplication formula is: yPublish = t(x) %*% yInner, where x is the dummy matrix.

Parameters zeroCandidates, forceInner, preRounded and plsWeights can be specified as functions. The supplied functions take the following arguments: data, yPublish, yInner, crossTable, x, roundBase, maxRound, and ..., where the first two are numeric vectors of original counts. When allSmall is TRUE, forceInner is set to ⁠function(yInner, maxRound, ...)⁠ yInner <= maxRound.

Details about the step parameter:

step as a numeric vector is converted to three parameters by
- step1 <- step[1]
- step2 <- ifelse(length(step)>=2, step[2], round(step/2))
- step3 <- ifelse(length(step)>=3, step[3], step[1])
After step1 steps forward, up to step2 backward steps may be performed. At the end of the algorithm; up to step3 backward steps may be executed repeatedly.
step when provided as a list (of numeric vectors), is adjusted to a length of 3 using rep_len(step, 3).
- step[[1]] is used in the main iterations.
- step[[2]], when non-NULL, is used in a final re-run iteration.
- step[[3]] is used in extra iterations caused by easyCheck or leverageCheck.
Setting step = list(0) will result in standard behavior, with the exception that an extra re-run iteration is performed. The most detailed setting is achieved by setting step to a length-3 list where each element has length 3.

Value

A list where the two first elements are two column matrices. The first matrix consists of inner cells and the second of cells to be published. In each matrix the first and the second column contains, respectively, original and rounded values. By default the cross table is the third element of the output list.

Note

Iterations are needed since after initial rounding of identified cells, new cells are identified. If cases of a high number of identified cells the algorithm can be too memory consuming (unless singleRandom=TRUE). To avoid problems, not more than maxIterRows cells are rounded in each iteration. The iteration limit (maxIter) is by default set to be high since a low number of maxIterRows may need a high number of iterations.

Examples

# See similar and related examples in PLSrounding documentation
RoundViaDummy(SmallCountData("e6"), "freq")
RoundViaDummy(SmallCountData("e6"), "freq", formula = ~eu * year + geo)
RoundViaDummy(SmallCountData("e6"), "freq", hierarchies = 
   list(geo = c("EU", "@Portugal", "@Spain", "Iceland"), year = c("2018", "2019")))

RoundViaDummy(SmallCountData('z2'), 
              'ant', ~region + hovedint + fylke*hovedint + kostragr*hovedint, 10)
mf <- ~region*mnd + hovedint*mnd + fylke*hovedint*mnd + kostragr*hovedint*mnd
a <- RoundViaDummy(SmallCountData('z3'), 'ant', mf, 5)
b <- RoundViaDummy(SmallCountData('sosialFiktiv'), 'ant', mf, 4)
print(cor(b[[2]]),digits=12) # Correlation between original and rounded

# Demonstrate parameter leverageCheck 
# The 42nd inner cell must be rounded since it can be revealed from the published cells.
mf2 <- ~region + hovedint + fylke * hovedint + kostragr * hovedint
RoundViaDummy(SmallCountData("z2"), "ant", mf2, leverageCheck = FALSE)$yInner[42, ]
RoundViaDummy(SmallCountData("z2"), "ant", mf2, leverageCheck = TRUE)$yInner[42, ]

## Not run: 
# Demonstrate parameters maxRound, zeroCandidates and forceInner 
# by tabulating the inner cells that have been changed.
z4 <- SmallCountData("sosialFiktiv")
for (forceInner in c("FALSE", "z4$ant < 10")) 
  for (zeroCandidates in c(FALSE, TRUE)) 
    for (maxRound in c(2, 5)) {
      set.seed(123)
      a <- RoundViaDummy(z4, "ant", formula = mf, maxRound = maxRound, 
                         zeroCandidates = zeroCandidates, 
                         forceInner = eval(parse(text = forceInner)))
      change <- a$yInner[, "original"] != a$yInner[, "rounded"]
      cat("\n\n---------------------------------------------------\n")
      cat("      maxRound:", maxRound, "\n")
      cat("zeroCandidates:", zeroCandidates, "\n")
      cat("    forceInner:", forceInner, "\n\n")
      print(table(original = a$yInner[change, "original"], rounded = a$yInner[change, "rounded"]))
      cat("---------------------------------------------------\n")
    }

## End(Not run)

SmallCountRounding documentation built on April 3, 2025, 6:03 p.m.