PLSrounding | R Documentation |

Small count rounding of necessary inner cells are performed so that all small frequencies of cross-classifications to be published (publishable cells) are rounded. The publishable cells can be defined from a model formula, hierarchies or automatically from data.

PLSrounding( data, freqVar = NULL, roundBase = 3, hierarchies = NULL, formula = NULL, dimVar = NULL, maxRound = roundBase - 1, printInc = nrow(data) > 1000, output = NULL, preAggregate = is.null(freqVar), ... ) PLSroundingInner(..., output = "inner") PLSroundingPublish(..., output = "publish")

`data` |
Input data as a data frame (inner cells) |

`freqVar` |
Variable holding counts (inner cells frequencies). When |

`roundBase` |
Rounding base |

`hierarchies` |
List of hierarchies |

`formula` |
Model formula defining publishable cells |

`dimVar` |
The main dimensional variables and additional aggregating variables. This parameter can be useful when hierarchies and formula are unspecified. |

`maxRound` |
Inner cells contributing to original publishable cells equal to or less than maxRound will be rounded |

`printInc` |
Printing iteration information to console when TRUE |

`output` |
Possible non-NULL values are |

`preAggregate` |
When |

`...` |
Further parameters sent to |

This function is a user-friendly wrapper for `RoundViaDummy`

with data frame output and with computed summary of the results.
See `RoundViaDummy`

for more details.

Output is a four-element list with class attribute "PLSrounded" (to ensure informative printing).

`inner` |
Data frame corresponding to input data with the main dimensional variables and with cell frequencies (original, rounded, difference). |

`publish` |
Data frame of publishable data with the main dimensional variables and with cell frequencies (original, rounded, difference). |

`metrics` |
A named character vector of various statistics calculated from the two output data frames
(" |

`freqTable` |
Matrix of frequencies of cell frequencies and absolute differences.
For example, row " |

Langsrud, Ø. and Heldal, J. (2018): “An Algorithm for Small Count Rounding of Tabular Data”.
Presented at: *Privacy in statistical databases*, Valencia, Spain. September 26-28, 2018.
https://www.researchgate.net/publication/327768398_An_Algorithm_for_Small_Count_Rounding_of_Tabular_Data

`RoundViaDummy`

, `PLS2way`

, `ModelMatrix`

# Small example data set z <- SmallCountData("e6") print(z) # Publishable cells by formula interface a <- PLSrounding(z, "freq", roundBase = 5, formula = ~geo + eu + year) print(a) print(a$inner) print(a$publish) print(a$metrics) print(a$freqTable) # Recalculation of maxdiff, HDutility, meanAbsDiff and rootMeanSquare max(abs(a$publish[, "difference"])) HDutility(a$publish[, "original"], a$publish[, "rounded"]) mean(abs(a$publish[, "difference"])) sqrt(mean((a$publish[, "difference"])^2)) # Six lines below produce equivalent results # Ordering of rows can be different PLSrounding(z, "freq") # All variables except "freq" as dimVar PLSrounding(z, "freq", dimVar = c("geo", "eu", "year")) PLSrounding(z, "freq", formula = ~eu * year + geo * year) PLSrounding(z[, -2], "freq", hierarchies = SmallCountData("eHrc")) PLSrounding(z[, -2], "freq", hierarchies = SmallCountData("eDimList")) PLSrounding(z[, -2], "freq", hierarchies = SmallCountData("eDimList"), formula = ~geo * year) # Define publishable cells differently by making use of formula interface PLSrounding(z, "freq", formula = ~eu * year + geo) # Define publishable cells differently by making use of hierarchy interface eHrc2 <- list(geo = c("EU", "@Portugal", "@Spain", "Iceland"), year = c("2018", "2019")) PLSrounding(z, "freq", hierarchies = eHrc2) # Also possible to combine hierarchies and formula PLSrounding(z, "freq", hierarchies = SmallCountData("eDimList"), formula = ~geo + year) # Single data frame output PLSroundingInner(z, "freq", roundBase = 5, formula = ~geo + eu + year) PLSroundingPublish(z, roundBase = 5, formula = ~geo + eu + year) # Microdata input PLSroundingInner(rbind(z, z), roundBase = 5, formula = ~geo + eu + year) # Parameter avoidHierarchical (see RoundViaDummy and ModelMatrix) PLSroundingPublish(z, roundBase = 5, formula = ~geo + eu + year, avoidHierarchical = TRUE) # Package sdcHierarchies can be used to create hierarchies. # The small example code below works if this package is available. if (require(sdcHierarchies)) { z2 <- cbind(geo = c("11", "21", "22"), z[, 3:4], stringsAsFactors = FALSE) h2 <- list( geo = hier_compute(inp = unique(z2$geo), dim_spec = c(1, 1), root = "Tot", as = "df"), year = hier_convert(hier_create(root = "Total", nodes = c("2018", "2019")), as = "df")) PLSrounding(z2, "freq", hierarchies = h2) } # Use PLS2way to produce tables as in Langsrud and Heldal (2018) and to demonstrate # parameters maxRound, zeroCandidates and identifyNew (see RoundViaDummy). # Parameter rndSeed used to ensure same output as in reference. exPSD <- SmallCountData("exPSD") a <- PLSrounding(exPSD, "freq", 5, formula = ~rows + cols, rndSeed=124) PLS2way(a, "original") # Table 1 PLS2way(a) # Table 2 a <- PLSrounding(exPSD, "freq", 5, formula = ~rows + cols, identifyNew = FALSE, rndSeed=124) PLS2way(a) # Table 3 a <- PLSrounding(exPSD, "freq", 5, formula = ~rows + cols, maxRound = 7) PLS2way(a) # Values in col1 rounded a <- PLSrounding(exPSD, "freq", 5, formula = ~rows + cols, zeroCandidates = TRUE) PLS2way(a) # (row3, col4): original is 0 and rounded is 5

