PLSrounding | R Documentation |
Small count rounding of necessary inner cells are performed so that all small frequencies of cross-classifications to be published (publishable cells) are rounded. The publishable cells can be defined from a model formula, hierarchies or automatically from data.
PLSrounding( data, freqVar = NULL, roundBase = 3, hierarchies = NULL, formula = NULL, dimVar = NULL, maxRound = roundBase - 1, printInc = nrow(data) > 1000, output = NULL, preAggregate = is.null(freqVar), ... ) PLSroundingInner(..., output = "inner") PLSroundingPublish(..., output = "publish")
data |
Input data as a data frame (inner cells) |
freqVar |
Variable holding counts (inner cells frequencies). When |
roundBase |
Rounding base |
hierarchies |
List of hierarchies |
formula |
Model formula defining publishable cells |
dimVar |
The main dimensional variables and additional aggregating variables. This parameter can be useful when hierarchies and formula are unspecified. |
maxRound |
Inner cells contributing to original publishable cells equal to or less than maxRound will be rounded |
printInc |
Printing iteration information to console when TRUE |
output |
Possible non-NULL values are |
preAggregate |
When |
... |
Further parameters sent to |
This function is a user-friendly wrapper for RoundViaDummy
with data frame output and with computed summary of the results.
See RoundViaDummy
for more details.
Output is a four-element list with class attribute "PLSrounded" (to ensure informative printing).
inner |
Data frame corresponding to input data with the main dimensional variables and with cell frequencies (original, rounded, difference). |
publish |
Data frame of publishable data with the main dimensional variables and with cell frequencies (original, rounded, difference). |
metrics |
A named character vector of various statistics calculated from the two output data frames
(" |
freqTable |
Matrix of frequencies of cell frequencies and absolute differences.
For example, row " |
Langsrud, Ø. and Heldal, J. (2018): “An Algorithm for Small Count Rounding of Tabular Data”. Presented at: Privacy in statistical databases, Valencia, Spain. September 26-28, 2018. https://www.researchgate.net/publication/327768398_An_Algorithm_for_Small_Count_Rounding_of_Tabular_Data
RoundViaDummy
, PLS2way
, ModelMatrix
# Small example data set z <- SmallCountData("e6") print(z) # Publishable cells by formula interface a <- PLSrounding(z, "freq", roundBase = 5, formula = ~geo + eu + year) print(a) print(a$inner) print(a$publish) print(a$metrics) print(a$freqTable) # Recalculation of maxdiff, HDutility, meanAbsDiff and rootMeanSquare max(abs(a$publish[, "difference"])) HDutility(a$publish[, "original"], a$publish[, "rounded"]) mean(abs(a$publish[, "difference"])) sqrt(mean((a$publish[, "difference"])^2)) # Six lines below produce equivalent results # Ordering of rows can be different PLSrounding(z, "freq") # All variables except "freq" as dimVar PLSrounding(z, "freq", dimVar = c("geo", "eu", "year")) PLSrounding(z, "freq", formula = ~eu * year + geo * year) PLSrounding(z[, -2], "freq", hierarchies = SmallCountData("eHrc")) PLSrounding(z[, -2], "freq", hierarchies = SmallCountData("eDimList")) PLSrounding(z[, -2], "freq", hierarchies = SmallCountData("eDimList"), formula = ~geo * year) # Define publishable cells differently by making use of formula interface PLSrounding(z, "freq", formula = ~eu * year + geo) # Define publishable cells differently by making use of hierarchy interface eHrc2 <- list(geo = c("EU", "@Portugal", "@Spain", "Iceland"), year = c("2018", "2019")) PLSrounding(z, "freq", hierarchies = eHrc2) # Also possible to combine hierarchies and formula PLSrounding(z, "freq", hierarchies = SmallCountData("eDimList"), formula = ~geo + year) # Single data frame output PLSroundingInner(z, "freq", roundBase = 5, formula = ~geo + eu + year) PLSroundingPublish(z, roundBase = 5, formula = ~geo + eu + year) # Microdata input PLSroundingInner(rbind(z, z), roundBase = 5, formula = ~geo + eu + year) # Parameter avoidHierarchical (see RoundViaDummy and ModelMatrix) PLSroundingPublish(z, roundBase = 5, formula = ~geo + eu + year, avoidHierarchical = TRUE) # Package sdcHierarchies can be used to create hierarchies. # The small example code below works if this package is available. if (require(sdcHierarchies)) { z2 <- cbind(geo = c("11", "21", "22"), z[, 3:4], stringsAsFactors = FALSE) h2 <- list( geo = hier_compute(inp = unique(z2$geo), dim_spec = c(1, 1), root = "Tot", as = "df"), year = hier_convert(hier_create(root = "Total", nodes = c("2018", "2019")), as = "df")) PLSrounding(z2, "freq", hierarchies = h2) } # Use PLS2way to produce tables as in Langsrud and Heldal (2018) and to demonstrate # parameters maxRound, zeroCandidates and identifyNew (see RoundViaDummy). # Parameter rndSeed used to ensure same output as in reference. exPSD <- SmallCountData("exPSD") a <- PLSrounding(exPSD, "freq", 5, formula = ~rows + cols, rndSeed=124) PLS2way(a, "original") # Table 1 PLS2way(a) # Table 2 a <- PLSrounding(exPSD, "freq", 5, formula = ~rows + cols, identifyNew = FALSE, rndSeed=124) PLS2way(a) # Table 3 a <- PLSrounding(exPSD, "freq", 5, formula = ~rows + cols, maxRound = 7) PLS2way(a) # Values in col1 rounded a <- PLSrounding(exPSD, "freq", 5, formula = ~rows + cols, zeroCandidates = TRUE) PLS2way(a) # (row3, col4): original is 0 and rounded is 5
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.