multi.disclosure: Disclosure measures for multiple of target variables.
In synthpop: Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control

multi.disclosure

R Documentation

Disclosure measures for multiple of target variables.

Description

Calculates, prints and plots tables of disclosure measures for a set of target variables from a fixed set of keys to form quasi-identifiers. The calculations of disclosure measures are done by the function disclosure for each target.

This function can be also used with synthetic data NOT created by syn(), or even made anonymous by other methods such as sampling More details of the measures calculated can be found in the package vignette "Disclosure measures for Synthetic Data".

Usage

## S3 method for class 'synds'
multi.disclosure(object, data, 
           keys , targets = NULL, print.flag = TRUE, 
           denom_lim = 5, exclude_ov_denom_lim = FALSE,
           not.targetslev = NULL,  
           usetargetsNA = TRUE,  usekeysNA = TRUE, 
           exclude.keys = NULL, exclude.keylevs = NULL,  exclude.targetlevs = NULL,
           ngroups_targets = NULL, ngroups_keys = NULL, 
           ident.meas = "repU", attrib.meas = "DiSCO",
           thresh_1way = c(50, 90),thresh_2way = c(4, 80), 
           digits = 2, plot = TRUE,  ...)

               
## S3 method for class 'data.frame'
multi.disclosure(object, data, cont.na = NULL, 
           keys , targets = NULL,  print.flag = TRUE, 
           denom_lim = 5, exclude_ov_denom_lim = FALSE,
           not.targetslev = NULL, 
           usetargetsNA = TRUE,  usekeysNA = TRUE, 
           exclude.keys = NULL, exclude.keylevs = NULL,  exclude.targetlevs = NULL,
           ngroups_targets = NULL, ngroups_keys = NULL, 
           ident.meas = "repU", attrib.meas = "DiSCO",
           thresh_1way = c(50, 90),thresh_2way = c(4, 80), 
           digits = 2, plot = TRUE,  compare.synorig = TRUE,  ...)

## S3 method for class 'list'
multi.disclosure(object, data, cont.na = NULL,
            keys , targets = NULL,  print.flag = TRUE, 
            denom_lim = 5, exclude_ov_denom_lim = FALSE,
           not.targetslev = NULL,  
           usetargetsNA = TRUE,  usekeysNA = TRUE, 
           exclude.keys = NULL, exclude.keylevs = NULL, exclude.targetlevs = NULL,
           ngroups_targets = NULL, ngroups_keys = NULL, 
           ident.meas = "repU", attrib.meas = "DiSCO",
           thresh_1way = c(50, 90),thresh_2way = c(4, 80), 
           digits = 2, plot = TRUE, compare.synorig = TRUE,...)


## S3 method for class 'multi.disclosure'
print(x, digits = NULL, plot = NULL, to.print =  c("ident","attrib"),
       ...)

Arguments

`object`	an object of class `synds`, which stands for 'synthesised data set'. It is typically created by function `syn()` and it includes `object$m` synthesised data set(s) as `object$syn`. This a single data set when `object$m = 1` or a list of length `object$m` when `object$m > 1`. Alternatively, when data are synthesised not using `syn()`, it can be a data frame with a synthetic data set or a list of data frames with synthetic data sets, all created from the same original data with the same variables and the same method.
`data`	the original (observed) data set.
`cont.na`	For data NOT supplied as a synthetic data object created by `synthpop`, this gives special values for continuous variables as described in the documentation for the function `syn`.
`keys`	a vector of strings with the names of variables to be used in combination to form a quasi identifier.
`targets`	a vector of strings with the names of variables to be used as targets for the disclosure measures. Defaults to all variables in both original and synthetic data that are not in `keys`.
`denom_lim`	an integer that determines the limit above which a warning to check the two way relationships for potential prior disclosure information.
`exclude_ov_denom_lim`	TRUE/FALSE according to whether disclosive groups with denominators > denom_lim should be excluded from disclosure measures.
`not.targetslev`	Vector of same length as targets giving level of each target to be excluded from calculating disclosure measures. Set elements for unaffected targets as blanks.
`print.flag`	TRUE/FALSE to print out line as disclosure for each member of targets is calculated.
`usetargetsNA`	A logical vector of the same length as `targets` that determines if `NA` values of each are to be considered disclosive. Defaults to `FAULT` for all.
`usekeysNA`	A logical vector of the same length as `keys` that determines if `NA` values of each key are to be considered disclosive. Defaults to `FAULT` for all keys.
`exclude.keys`	A list of same length as `targets` giving the keys for two way exclusions for the ith target. For details see documentation in `disclosure`
`exclude.keylevs`	A list of same length as `targets` giving the levels of keys for two way exclusions for the ith target. For details see documentation in `disclosure`
`exclude.targetlevs`	A list of same length as `targets` giving the levels of target for two way exclusions for the ith target. For details see documentation in `disclosure`
`ngroups_targets`	Unless set to NULL (the default) numeric target variables will be grouped into `ngroups_target` categories. If `ngroups_keys` is of length 1 all numeric targets will be have the same number of groups. Otherwise `ngroups_targets` needs to be a vector of the same length as targets and will give the number of groups for each target. If an element of `ngroups_targets` is zero, no grouping will be done.
`ngroups_keys`	Unless set to NULL (the default) any numeric variable will be grouped into categories If `ngroups_keys` is of length 1 all numeric keys will be have the same number of groups. Otherwise `ngroups_keys` needs to be the same length as keys and will give the number of groups for each key. If an element of `ngroups_keys` is zero, no grouping will be done.
`ident.meas`	Choice of statistics to use as a measure of identity disclosure. Must be a selection from: `"repU"` or `"UiSiO"`. See `disclosure` for explanations of measures.
`attrib.meas`	Choice of statistics to use as a measure of attribute disclosure. Must be a selection from: `"DiSCO"` or `"DiSDiO"`. See `disclosure` for explanations of measures.
`thresh_1way`	A vector of two numeric values both of which meed to be exceeded for warnings about a level of the target that may be dominating the results. The first is the count of all disclosive records, and the second is the % of all records for this level of the target. Default is c(50, 90), meaning a group of 50 disclosive records for this level of the target where they make up over 90% of all disclosive records.
`thresh_2way`	A vector of two numeric values both of which meed to be exceeded for warnings about a level of the target that may be dominating the results. The first is the count of all disclosive records for this key-target combination and the second is the percantage of all disclosive records for this combination. Default is c(5, 80), meaning a group of more than 5 records where over 80% of all the original values with this key have this level of the target.
`digits`	number of digits to print for the disclosure measures.
`plot`	determines if plot will be produced when the result is printed.
`print`	logical value that determines if a summary of results is to be printed.
`compare.synorig`	a logical value to determine if the functions `synorig.compare()` should be used to check that data sets can be compared. Default set to `FALSE` except when the synthetic data are supplied as a data.frame or a list when set to TRUE.
`to.print`	Vector of items to be printed including "ident", "attrib", both or NULL
`...`	additional parameters
`x`	an object of class `multi.disclosure`.

Details

Calculates measures of identity and attribution disclosure from the keys specified in keys with the function disclosure. For attribute disclosure a table with one line for each target can be printed or plotted. Details are in help file for disclosure.

Value

An object of class multi.disclosure which is a list with the following components:

`attrib.table`	a table with the selected attribute disclosure measure (`attrib.meas`) for synthetic data and corresponding measure for the original data "CAPd" if (`attrib.meas`) is "DCAP", and "DiO" for others.
`attrib.plot`	plot of attrib.table with labels indicating where large denominators suggest checking.
`keys`	see above.
`ident.orig`	value of identity disclosure `UiO` from the original data, see help file for `disclosure`.
`ident.syn`	value of identity disclosure `ident.meas` from the synthetic data, see help file for `disclosure`.
`Norig`	Number of records in data.
`denom_lim`	see above.
`exclude_ov_denom_lim`	see above.
`digits`	see above.
`usetargetsNA`	see above.
`usekeysNA`	see above.
`ident.meas`	see above.
`attrib.meas`	see above.
`m`	see above.
`plot`	see above.
`output.list`	A named list with a component for each target where each component is the output from the function `disclosure` for that target. This allows check_1way and check_2way to be examined for each target.
`call`	R call used to create the object

References

to follow link to vignette

Examples

ods <- SD2011[, c("sex", "age", "edu", "marital", "region", "income")]
s1 <- syn(ods)

### synthetic data provided as a 'data.frame' object
t1 <- multi.disclosure(s1$syn, ods,
keys = c("sex", "age", "edu"))

### synthetic data provided as a 'synds' object  
t1 <- multi.disclosure(s1, ods, 
keys = c("sex", "age", "edu"))

synthpop documentation built on June 8, 2025, 1:31 p.m.