localSuppression: Local Suppression to obtain k-anonymity

View source: R/localSuppression.R

localSuppressionR Documentation

Local Suppression to obtain k-anonymity

Description

Algorithm to achieve k-anonymity by performing local suppression.

Usage

localSuppression(obj, k = 2, importance = NULL, combs = NULL, ...)

kAnon(obj, k = 2, importance = NULL, combs = NULL, ...)

Arguments

obj

a sdcMicroObj-class object or a data.frame

k

Threshold for k-anonymity

importance

Numeric vector of values between 1 and n (n = length(keyVars)). This vector defines the "importance" of variables for local suppression. Variables with importance = 1 will, if possible, not be suppressed; variables with importance = n will be prioritized for suppression.

combs

Numeric vector. If specified, the algorithm provides k-anonymity for each combination of n key variables (with n being the value of the ith element of this parameter). For example, combs = c(4,3) means that k-anonymity will be provided for all combinations of 4 and then 3 key variables. It is possible to assign different k values for each combination by supplying k as a vector. If k has only one value, it will be used for all subsets.

...

see additional arguments below:

  • keyVars: Names or indices of categorical key variables (for data.frame method)

  • strataVars: Name or index of the variable used for stratification. k-anonymity is ensured within each category of this variable.

  • alpha: Numeric value between 0 and 1 specifying how much keys with missing values (NAs) contribute to the calculation of fk and Fk. Default is 1. Used only in the data.frame method.

  • nc: Maximum number of cores used for stratified computations. Default is 1. Parallelization is ignored on Windows.

Details

The algorithm provides a k-anonymized data set by suppressing values in key variables. The algorithm tries to find an optimal solution to suppress as few values as possible and considers the specified importance vector. If not specified, the importance vector is constructed in a way such that key variables with a high number of characteristics are considered less important than key variables with a low number of characteristics.

The implementation provides k-anonymity per strata, if slot strataVar has been set in sdcMicroObj-class or if parameter strataVar is used when applying the data.frame method. For details, see the examples provided.

For the parameter alpha:

  • alpha = 1 counts all wildcard matches (i.e. NAs match everything).

  • alpha = 0 assumes missing values form their own categories.

These are two extremes. With alpha = 0, frequencies are likely underestimated when NAs are present. If combs is used with alpha = 0, the heuristic nature of kAnon() may lead to technically correct, but not always intuitively understandable frequency evaluations.

Value

A modified dataset with suppressions that meets k-anonymity based on the specified key variables, or the modified sdcMicroObj-class object.

Note

Deprecated methods localSupp2 and localSupp2Wrapper are no longer available in sdcMicro versions > 4.5.0. kAnon() is a more intuitive term for local suppression, since the goal is to achieve k-anonymity.

Author(s)

Bernhard Meindl, Matthias Templ

References

Templ, M. Statistical Disclosure Control for Microdata: Methods and Applications in R. Springer International Publishing, 287 pages, 2017. ISBN: 978-3-319-50272-4. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-3-319-50272-4")}

Templ, M., Kowarik, A., Meindl, B. Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro. Journal of Statistical Software, 67(4), 1–36, 2015. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v067.i04")}

Examples


data(francdat)

## Local Suppression
localS <- localSuppression(francdat, keyVar = c(4, 5, 6))
localS
plot(localS)

## for objects of class sdcMicro, no stratification
data(testdata2)
kv <- c("urbrur", "roof", "walls", "water", "electcon", "relat", "sex")
sdc <- createSdcObj(testdata2, keyVars = kv, w = "sampling_weight")
sdc <- localSuppression(sdc)

## for objects of class sdcMicro, with stratification
testdata2$ageG <- cut(testdata2$age, 5, labels = paste0("AG", 1:5))
sdc <- createSdcObj(
  dat = testdata2,
  keyVars = kv,
  w = "sampling_weight",
  strataVar = "ageG"
)
sdc <- localSuppression(sdc, nc = 1)

## it is also possible to provide k-anonymity for subsets of key-variables
## with different parameter k!
## in this case we want to provide 10-anonymity for all combinations
## of 5 key variables, 20-anonymity for all combinations with 4 key variables
## and 30-anonymity for all combinations of 3 key variables.
sdc <- createSdcObj(testdata2, keyVars = kv, w = "sampling_weight")
combs <- 5:3
k <- c(10, 20, 30)
sdc <- localSuppression(sdc, k = k, combs = combs)

## data.frame method (no stratification)
inp <- testdata2[, c(kv, "ageG")]
ls <- localSuppression(inp, keyVars = 1:7)
print(ls)
plot(ls)

## data.frame method (with stratification)
ls <- kAnon(inp, keyVars = 1:7, strataVars = 8)
print(ls)
plot(ls)


sdcMicro documentation built on Aug. 22, 2025, 5:13 p.m.