addNoise: Adding noise to perturb data

View source: R/addNoise.r

addNoiseR Documentation

Adding noise to perturb data

Description

Various methods for adding noise to perturb continuous scaled variables.

Usage

addNoise(obj, variables = NULL, noise = 150, method = "additive", ...)

Arguments

obj

either a data.frame or a sdcMicroObj-class that should be perturbed

variables

vector with names of variables that should be perturbed

noise

amount of noise (in percentages)

method

choose between ‘additive’, ‘correlated’, ‘correlated2’, ‘restr’, ‘ROMM’, ‘outdect’

...

see possible arguments below

Details

If ‘obj’ is of class sdcMicroObj-class, all continuous key variables are selected per default. If ‘obj’ is of class “data.frame” or “matrix”, the continuous variables have to be specified.

Method ‘additive’ adds noise completely at random to each variable depending on its size and standard deviation. ‘correlated’ and method ‘correlated2’ adds noise and preserves the covariances as described in R. Brand (2001) or in the reference given below. Method ‘restr’ takes the sample size into account when adding noise. Method ‘ROMM’ is an implementation of the algorithm ROMM (Random Orthogonalized Matrix Masking) (Fienberg, 2004). Method ‘outdect’ adds noise only to outliers. The outliers are identified with univariate and robust multivariate procedures based on a robust mahalanobis distances calculated by the MCD estimator.

Value

If ‘obj’ was of class sdcMicroObj-class the corresponding slots are filled, like manipNumVars, risk and utility.

If ‘obj’ was of class “data.frame” or “matrix” an object of class “micro” with following entities is returned:

x

the original data

xm

the modified (perturbed) data

method

method used for perturbation

noise

amount of noise

Author(s)

Matthias Templ and Bernhard Meindl

References

Domingo-Ferrer, J. and Sebe, F. and Castella, J., “On the security of noise addition for privacy in statistical databases”, Lecture Notes in Computer Science, vol. 3050, pp. 149-161, 2004. ISSN 0302-9743. Vol. Privacy in Statistical Databases, eds. J. Domingo-Ferrer and V. Torra, Berlin: Springer-Verlag.

Ting, D. Fienberg, S.E. and Trottini, M. “ROMM Methodology for Microdata Release” Joint UNECE/Eurostat work session on statistical data confidentiality, Geneva, Switzerland, 2005, https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.46/2005/wp.11.e.pdf

Ting, D., Fienberg, S.E., Trottini, M. “Random orthogonal matrix masking methodology for microdata release”, International Journal of Information and Computer Security, vol. 2, pp. 86-105, 2008.

Templ, M. and Meindl, B., Robustification of Microdata Masking Methods and the Comparison with Existing Methods, Lecture Notes in Computer Science, Privacy in Statistical Databases, vol. 5262, pp. 177-189, 2008.

Templ, M. New Developments in Statistical Disclosure Control and Imputation: Robust Statistics Applied to Official Statistics, Suedwestdeutscher Verlag fuer Hochschulschriften, 2009, ISBN: 3838108280, 264 pages.

Templ, M. and Meindl, B. and Kowarik, A.: Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro, Journal of Statistical Software, 67 (4), 1–36, 2015. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v067.i04")}

Templ, M. Statistical Disclosure Control for Microdata: Methods and Applications in R. Springer International Publishing, 287 pages, 2017. ISBN 978-3-319-50272-4. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/978-3-319-50272-4")}

See Also

sdcMicroObj-class, summary.micro

Examples


data(Tarragona)

a1 <- addNoise(Tarragona)
a1


data(testdata)

# donttest because Examples with CPU time > 2.5 times elapsed time
testdata[, c('expend','income','savings')] <-
addNoise(testdata[,c('expend','income','savings')])$xm

## for objects of class sdcMicroObj:
data(testdata2)
sdc <- createSdcObj(testdata2,
  keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'),
  numVars=c('expend','income','savings'), w='sampling_weight')
sdc <- addNoise(sdc)


sdcMicro documentation built on Sept. 27, 2023, 5:07 p.m.