RoundViaDummy: Small Count Rounding of Tabular Data

Description Usage Arguments Details Value Note See Also Examples

View source: R/RoundViaDummy.R

Description

Small count rounding via a dummy matrix and by an algorithm inspired by PLS

Usage

1
2
3
4
RoundViaDummy(data, freqVar, formula = NULL, roundBase = 3,
  singleRandom = FALSE, crossTable = TRUE, total = "Total",
  maxIterRows = 1000, maxIter = 1e+07, x = NULL,
  hierarchies = NULL, ...)

Arguments

data

Input data as a data frame (inner cells)

freqVar

Variable holding counts (name or number)

formula

Model formula defining publishable cells. Will be used to calculate x (via ModelMatrix). When NULL, x must be supplied.

roundBase

Rounding base

singleRandom

Single random draw when TRUE (instead of algorithm)

crossTable

When TRUE, cross table in output and caculations via FormulaSums()

total

String used to name totals

maxIterRows

See details

maxIter

Maximum number of iterations

x

Dummy matrix defining publishable cells

hierarchies

List of hierarchies, which can be converted by AutoHierarchies. Thus, a single string as hierarchy input is assumed to be a total code. Exceptions are "rowFactor" or "", which correspond to only using the categories in the data.

...

Further parameters sent to Hierarchies2ModelMatrix

Details

Small count rounding of necessary inner cells are performed so that all small frequencies of cross-classifications to be published (publishable cells) are rounded. This is equivalent to changing micro data since frequencies of unique combinations are changed. Thus, additivity and consistency are guaranteed. The matrix multiplication formula is: yPublish = t(x) %*% yInner, where x is the dummy matrix.

Value

A list where the two first elements are two column matrices. The first matrix consists of inner cells and the second of cells to be published. In each matrix the first and the second column contains, respectively, original and rounded values. By default the cross table is the third element of the output list.

Note

Iterations are needed since after initial rounding of identified cells, new cells are identified. If cases of a high number of identified cells the algorithm can be too memory consuming (unless singleRandom=TRUE). To avoid problems, not more than maxIterRows cells are rounded in each iteration. The iteration limit (maxIter) is by default set to be high since a low number of maxIterRows may need a high number of iterations.

See Also

See the user-friendly wrapper PLSrounding and see Round2 for rounding by other algorithm

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# See similar and related examples in PLSrounding documentation
RoundViaDummy(SmallCountData("e6"), "freq")
RoundViaDummy(SmallCountData("e6"), "freq", formula = ~eu * year + geo)
RoundViaDummy(SmallCountData("e6"), "freq", hierarchies = 
   list(geo = c("EU", "@Portugal", "@Spain", "Iceland"), year = c("2018", "2019")))

RoundViaDummy(SmallCountData('z2'), 
              'ant', ~region + hovedint + fylke*hovedint + kostragr*hovedint, 10)
mf <- ~region*mnd + hovedint*mnd + fylke*hovedint*mnd + kostragr*hovedint*mnd
a <- RoundViaDummy(SmallCountData('z3'), 'ant', mf, 5)
b <- RoundViaDummy(SmallCountData('sosialFiktiv'), 'ant', mf, 4)
print(cor(b[[2]]),digits=12) # Correlation between original and rounded

SmallCountRounding documentation built on May 2, 2019, 8:33 a.m.