microaggrGower: Microaggregation for numerical and categorical key variables...
In sdcTools/sdcMicro: Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation

microaggrGower

R Documentation

Microaggregation for numerical and categorical key variables based on a distance similar to the Gower Distance

Description

The microaggregation is based on the distances computed similar to the Gower distance. The distance function makes distinction between the variable types factor,ordered,numerical and mixed (semi-continuous variables with a fixed probability mass at a constant value e.g. 0)

Usage

microaggrGower(
  obj,
  variables = NULL,
  aggr = 3,
  dist_var = NULL,
  by = NULL,
  mixed = NULL,
  mixed.constant = NULL,
  trace = FALSE,
  weights = NULL,
  numFun = mean,
  catFun = VIM::sampleCat,
  addRandom = FALSE
)

Arguments

`obj`	`sdcMicroObj-class`-object or a `data.frame`
`variables`	character vector with names of variables to be aggregated (Default for sdcMicroObj is all keyVariables and all numeric key variables)
`aggr`	aggregation level (default=3)
`dist_var`	character vector with variable names for distance computation
`by`	character vector with variable names to split the dataset before performing microaggregation (Default for sdcMicroObj is strataVar)
`mixed`	character vector with names of mixed variables
`mixed.constant`	numeric vector with length equal to mixed, where the mixed variables have the probability mass
`trace`	TRUE/FALSE for some console output
`weights`	numerical vector with length equal the number of variables for distance computation
`numFun`	function: to be used to aggregated numerical variables
`catFun`	function: to be used to aggregated categorical variables
`addRandom`	TRUE/FALSE if a random value should be added for the distance computation.

Details

The function sampleCat samples with probabilities corresponding to the occurrence of the level in the NNs. The function maxCat chooses the level with the most occurrences and random if the maximum is not unique.

Value

The function returns the updated sdcMicroObj or simply an altered data frame.

Note

In each by group all distance are computed, therefore introducing more by-groups significantly decreases the computation time and memory consumption.

Author(s)

Alexander Kowarik

Examples


data(testdata,package="sdcMicro")
testdata <- testdata[1:200,]

for(i in c(1:7,9)) testdata[,i] <- as.factor(testdata[,i])
test <- microaggrGower(testdata,variables=c("relat","age","expend"),
  dist_var=c("age","sex","income","savings"),by=c("urbrur","roof"))

sdc <- createSdcObj(testdata,
  keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'),
  numVars=c('expend','income','savings'), w='sampling_weight')

sdc <- microaggrGower(sdc)

sdcTools/sdcMicro documentation built on Feb. 22, 2025, 4:35 a.m.

sdcTools/sdcMicro index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sdcTools/sdcMicro
Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation

microaggrGower: Microaggregation for numerical and categorical key variables...
In sdcTools/sdcMicro: Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation

Microaggregation for numerical and categorical key variables based on a distance similar to the Gower Distance

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

Related to microaggrGower in sdcTools/sdcMicro...

R Package Documentation

Browse R Packages

We want your feedback!

sdcTools/sdcMicro Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation

microaggrGower: Microaggregation for numerical and categorical key variables... In sdcTools/sdcMicro: Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation

Microaggregation for numerical and categorical key variables based on a distance similar to the Gower Distance

Description

Usage

Arguments

Details

Value

Note

Author(s)

See Also

Examples

Related to microaggrGower in sdcTools/sdcMicro...

R Package Documentation

Browse R Packages

We want your feedback!

sdcTools/sdcMicro
Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation

microaggrGower: Microaggregation for numerical and categorical key variables...
In sdcTools/sdcMicro: Statistical Disclosure Control Methods for Anonymization of Data and Risk Estimation