Small count rounding via a dummy matrix and by an algorithm inspired by PLS
RoundViaDummy( data, freqVar, formula = NULL, roundBase = 3, singleRandom = FALSE, crossTable = TRUE, total = "Total", maxIterRows = 1000, maxIter = 1e+07, x = NULL, hierarchies = NULL, xReturn = FALSE, maxRound = roundBase  1, zeroCandidates = FALSE, forceInner = FALSE, identifyNew = TRUE, step = 0, preRounded = NULL, leverageCheck = FALSE, easyCheck = TRUE, printInc = TRUE, rndSeed = 123, dimVar = NULL, plsWeights = NULL, preDifference = NULL, allSmall = FALSE, ... )
data 
Input data as a data frame (inner cells) 
freqVar 
Variable holding counts (name or number) 
formula 
Model formula defining publishable cells. Will be used to calculate 
roundBase 
Rounding base 
singleRandom 
Single random draw when TRUE (instead of algorithm) 
crossTable 
When TRUE, cross table in output and caculations via FormulaSums() 
total 
String used to name totals 
maxIterRows 
See details 
maxIter 
Maximum number of iterations 
x 
Dummy matrix defining publishable cells 
hierarchies 
List of hierarchies, which can be converted by 
xReturn 
Dummy matrix in output when TRUE (as input parameter 
maxRound 
Inner cells contributing to original publishable cells equal to or less than maxRound will be rounded. 
zeroCandidates 
When TRUE, inner cells in input with zero count (and multiple of roundBase when maxRound is in use) contributing to publishable cells will be included as candidates to obtain roundBase value. With vector input, the rule is specified individually for each cell. This can be specified as a vector, a variable in data or a function generating it (see details). 
forceInner 
When TRUE, all inner cells will be rounded. Use vector input to force individual cells to be rounded. This can be specified as a vector, a variable in data or a function generating it (see details). Can be combined with parameter zeroCandidates to allow zeros and roundBase multiples to be rounded up. 
identifyNew 
When 
step 
When 
preRounded 
A vector or a variable in data that contains a mixture of missing values and predetermined values of rounded inner cells. Can also be specified as a function generating it (see details). 
leverageCheck 
When TRUE, all inner cells that depends linearly on the published cells and with small frequencies
( 
easyCheck 
A light version of the above leverage checking.
Checking is performed after rounding. Extra iterations are performed when needed.

printInc 
Printing iteration information to console when TRUE 
rndSeed 
If nonNULL, a random generator seed to be used locally within the function without affecting the random value stream in R. 
dimVar 
The main dimensional variables and additional aggregating variables. This parameter can be useful when hierarchies and formula are unspecified. 
plsWeights 
A vector of weights for each cell to be published or a function generating it (see details). For use in the algorithm criterion. 
preDifference 
A data.frame with differences already obtained from rounding another subset of data.
There must be columns that match 
allSmall 
When TRUE, all small inner cells ( 
... 
Further parameters sent to 
Small count rounding of necessary inner cells are performed so that all small frequencies of crossclassifications to be published
(publishable cells) are rounded. This is equivalent to changing micro data since frequencies of unique combinations are changed.
Thus, additivity and consistency are guaranteed. The matrix multiplication formula is:
yPublish
=
t(x)
%*%
yInner
, where x
is the dummy matrix.
Parameters zeroCandidates
, forceInner
, preRounded
and plsWeights
can be specified as functions.
The supplied functions take the following arguments: data
, yPublish
, yInner
, crossTable
, x
, roundBase
, maxRound
, and ...
,
where the first two are numeric vectors of original counts.
When allSmall
is TRUE
, forceInner
is set to function(yInner, maxRound, ...)
yInner <= maxRound
.
A list where the two first elements are two column matrices. The first matrix consists of inner cells and the second of cells to be published. In each matrix the first and the second column contains, respectively, original and rounded values. By default the cross table is the third element of the output list.
Iterations are needed since after initial rounding of identified cells, new cells are identified. If cases of a high number of identified cells the algorithm can be too memory consuming (unless singleRandom=TRUE). To avoid problems, not more than maxIterRows cells are rounded in each iteration. The iteration limit (maxIter) is by default set to be high since a low number of maxIterRows may need a high number of iterations.
See the userfriendly wrapper PLSrounding
and see Round2
for rounding by other algorithm
# See similar and related examples in PLSrounding documentation RoundViaDummy(SmallCountData("e6"), "freq") RoundViaDummy(SmallCountData("e6"), "freq", formula = ~eu * year + geo) RoundViaDummy(SmallCountData("e6"), "freq", hierarchies = list(geo = c("EU", "@Portugal", "@Spain", "Iceland"), year = c("2018", "2019"))) RoundViaDummy(SmallCountData('z2'), 'ant', ~region + hovedint + fylke*hovedint + kostragr*hovedint, 10) mf < ~region*mnd + hovedint*mnd + fylke*hovedint*mnd + kostragr*hovedint*mnd a < RoundViaDummy(SmallCountData('z3'), 'ant', mf, 5) b < RoundViaDummy(SmallCountData('sosialFiktiv'), 'ant', mf, 4) print(cor(b[[2]]),digits=12) # Correlation between original and rounded # Demonstrate parameter leverageCheck # The 42nd inner cell must be rounded since it can be revealed from the published cells. mf2 < ~region + hovedint + fylke * hovedint + kostragr * hovedint RoundViaDummy(SmallCountData("z2"), "ant", mf2, leverageCheck = FALSE)$yInner[42, ] RoundViaDummy(SmallCountData("z2"), "ant", mf2, leverageCheck = TRUE)$yInner[42, ] ## Not run: # Demonstrate parameters maxRound, zeroCandidates and forceInner # by tabulating the inner cells that have been changed. z4 < SmallCountData("sosialFiktiv") for (forceInner in c("FALSE", "z4$ant < 10")) for (zeroCandidates in c(FALSE, TRUE)) for (maxRound in c(2, 5)) { set.seed(123) a < RoundViaDummy(z4, "ant", formula = mf, maxRound = maxRound, zeroCandidates = zeroCandidates, forceInner = eval(parse(text = forceInner))) change < a$yInner[, "original"] != a$yInner[, "rounded"] cat("\n\n\n") cat(" maxRound:", maxRound, "\n") cat("zeroCandidates:", zeroCandidates, "\n") cat(" forceInner:", forceInner, "\n\n") print(table(original = a$yInner[change, "original"], rounded = a$yInner[change, "rounded"])) cat("\n") } ## End(Not run)
