disclosure: Disclosure measures
In synthpop: Generating Synthetic Versions of Sensitive Microdata for Statistical Disclosure Control

disclosure

R Documentation

Disclosure measures

Description

Calculates disclosure measures for synthetic data. NOTE: The other function that calculates disclosure results for multiple targets has been renamed as multi.disclosure from disclosure.summary.

Usage

## S3 method for class 'synds'
disclosure(object, data, keys , target , print.flag = TRUE,
           denom_lim = 5, exclude_ov_denom_lim = FALSE, not.targetlev = NULL,
           usetargetNA = TRUE, usekeysNA = TRUE, 
           exclude.keys =NULL, exclude.keylevs = NULL, exclude.targetlevs = NULL,
           ngroups_target = NULL, ngroups_keys = NULL, 
           thresh_1way = c(50, 90),thresh_2way = c(4, 80),
           digits = 2, to.print =c("short"),...) 

## S3 method for class 'data.frame'
disclosure(object, data,cont.na = NULL, keys , target , print.flag = TRUE,
           denom_lim = 5, exclude_ov_denom_lim = FALSE, 
           not.targetlev = NULL,
           usetargetNA = TRUE, usekeysNA = TRUE, 
           exclude.keys =NULL, exclude.keylevs = NULL, exclude.targetlevs = NULL,
           ngroups_target = NULL, ngroups_keys = NULL, 
           thresh_1way = c(50, 90),thresh_2way = c(4, 80),
           digits = 2, to.print =c("short"), compare.synorig = TRUE, ...) 

## S3 method for class 'list'
disclosure(object, data,cont.na = NULL, keys , target , print.flag = TRUE,
           denom_lim = 5, exclude_ov_denom_lim = FALSE, 
           not.targetlev = NULL,
           usetargetNA = TRUE, usekeysNA = TRUE, 
           exclude.keys =NULL, exclude.keylevs = NULL, exclude.targetlevs = NULL,
           ngroups_target = NULL, ngroups_keys = NULL, 
           thresh_1way = c(50, 90),thresh_2way = c(4, 80),
           digits = 2, to.print =c("short"), compare.synorig = TRUE, ...) 
           
## S3 method for class 'disclosure'
print(x,  to.print =NULL, digits = NULL, ...)

Arguments

`object`	an object of class `synds`, which stands for 'synthesised data set'. It is typically created by function `syn()` and it includes `object$m` synthesised data set(s) as `object$syn`. This a single data set when `object$m = 1` or a list of length `object$m` when `object$m > 1`. Alternatively, when data are synthesised not using `syn()`, it can be a data frame with a synthetic data set or a list of data frames with synthetic data sets, all created from the same original data with the same variables.
`data`	the original (observed) data set.
`cont.na`	For data NOT supplied as a synthetic data object created by `synthpop`, this gives special values for continuous variables as described in the documentation for the function `syn`.
`keys`	vector of variable names or column numbers in data that are also present in the synthetic data to act as quasi-identifiers for identity or attribute disclosure.
`target`	name of target variable for attribute disclosure.
`denom_lim`	Limit to use to exclude large key-target group, see next item.
`exclude_ov_denom_lim`	logical to exclude key targetcombinations that contribute more than `denom_lim` disclosive records. These are often flagged from `thresh_2way` where the first element corresponds to `denom_lim`
`print.flag`	logical value as to whether a line is printed as disclosure is calculated for each synthetic data set.
`digits`	number of digits to print for disclosure measures.
`usetargetNA`	determines whether NA values in target are to be used in checking for disclosure
`usekeysNA`	determines whether NA values in keys are to be used in checking for disclosure.
`not.targetlev`	Character variable giving level of target to be excluded from disclosure measures. Usually identified by checklev_1way.
`exclude.keys`	vector of names of keys that, with the next two items will define the target and key combinations to be excluded from the calculation of disclosure measures. Often identified by checklev_2way.
`exclude.keylevs`	vector of the same length as exclude.keys that give the levels to be excluded for the corresponding key.
`exclude.targetlevs`	vector of the same length as exclude.keys that give the levels of target that will be excluded for each key and key level.
`ngroups_target`	Unless set to NULL (the default) a numeric target variable will be grouped into `ngroups_target` categories.
`ngroups_keys`	Unless set to NULL (the default) any numeric variable will be grouped into categories. If `ngroups_keys` is of length 1 all numeric keys will be have the same number of groups. Otherwise `ngroups_keys` needs to be the same length as keys and will give the number of groups for each key. If an element of `ngroups_keys` is zero, no grouping will be done.
`thresh_1way`	A vector of two numeric values both of which meed to be exceeded for warnings about a level of the target that may be dominating the results. The first is the count of all disclosive records for this level of the target, and the second is the % of all original records for this level of the target. Default is c(50, 90), meaning a group of 50 disclosive records for this level of the target where they make up over 90% of all disclosive records.
`thresh_2way`	A vector of two numeric values both of which meed to be exceeded for warnings about a level of the target that may be dominating the results. The first is the count of disclosive records for a quasi-identifier used to identify possible s that are searched for the most disclosive key-target combination. The second is the percentage of all original records for each combination examined that must be exceeded to trigger a warning. Default is c(5, 80), meaning a pairs found from key-target groups of more than 5 records where over 80% of all the original values with these key-target pairs have this level of the target.
`to.print`	Vector to determine what aspects of an object of class disclosure will be printed. Must consist of one or more of the following "short", "ident", "attrib","allCAPs", "all", "check_1way", "check_2way", "exclusions". Default is "short" giving a brief summary.
`compare.synorig`	a logical value to determine if the functions `synorig.compare()` should be used to check that data sets can be compared. Used when the synthetic data are supplied as a data.frame or a list when default set to TRUE.
`...`	additional parameters
`x`	an object of class `disclosure`.

Details

Calculates identity disclosure measures for a for a set of keys, (quasi identifiers) and attribute disclosure measures for one variable from the same set of keys considered as a target. The function multi.disclosure calls this function and summarises the attribute disclosure measures for multiple targets. See the vignette

Value

An object of class disclosure which is a list with the following components.

`call`	the call that created the object.
`ident`	Table of measures of identity disclosure one for each synthesis. Measures are "UiO","UiS","UiSiO" and "repU". See vignette disclosure.pdf for an explanation of these and the following measures.
`attrib`	Table of measures of attribute disclosure one for each synthesis. These include "DiO","DiS","iSO","DiSCO" and "DiSDiO". The measures "DiO" and "DiS" are the percentage of the target that are disclosed from the original and synthetic data with these keys. The next measure "iSO" gives the percentage of the key combinations in the synthetic data that are present in the original - one was in which the disclosure. "DiSCO" gives the percentage of original records where the attribution to the target is correct as judged from the original. "DiSDiO" gives the % of origina; records in "DISCO" that are unique in the original data. The table also as gives the maximum and mean of the denominators for the "DiSCO" measure i.e. the distribution for every record that leads to a correct disclosure of the number of observations with the same keys and the same correct target in the synthetic data. Large denominators are often an indication that the disclosure is something that might be expected from prior knowledge of relations.
`allCAPs`	Table of the following measures of correct attribution probability: "baseCAPd","CAPd", "CAPs" , "DCAP" and "TCAP"'
`check_1way`	A data frame with one record per synthesis identifying the level of the target with numbers of disclosive records that are above thresholds defined by `thresh_1way`, with default value c(50,90). This means that there must be more than 50 disclosive records with this level of the target, and that 90% or more of all disclosive records must have this target. The value of most_dis_lev will be blank if no level exceeds these thresholds. Note this level will be identified for data without excluded or missing values of keys if there are any excluded records.
`check1`	The level of the target identified by check_1way ' or blank if none
`check_2way`	A list of length number of syntheses giving details for each of the two-way combinations of target and keys where the the numbers of disclosive records are above thresholds defined by `thresh_2way`. The default value for this is c(5, 80), meaning that there must be at least 5 records with this combination of targets and keys and that 80% or more of records in the original data with this level of the key will have this level of the target. If no combinations exceed `thresh_2way` for one of the syntheses then the list element is NULL. Such disclosive combinations are often associated with a high prior probability of the target from just this level of one of the keys in the original data. Note these combinations will be identified for data without excluded or missing values of keys if there are any excluded combinations or target if any of `usekeysNA` or `usetargetNA` are FALSE.
`Nexclusions`	A list of length number of syntheses with number of records excluded from attribute measures for different reasons.
`keys`	as input
`digits`	as input
`Norig`	Number of records in data
`to.print`	as input

Note

See package vignette disclosure.pdf for additional information including formal definitions of all quantities and worked examples.

References

See references in package vignette

Examples

library(synthpop)
ods <- SD2011[, c("sex", "age", "edu", "marital", "income")]
odsF <- numtocat.syn(ods, numtocat = "income", catgroups = 7, cont.na = list(income = -8))
s1 <- syn(odsF$data, method = "ctree",seed = 75, m=3, k=1000)
disc1 <- disclosure(s1, odsF$data, target = "income", 
                    keys = c("sex", "age", "edu","marital"))

synthpop documentation built on June 8, 2025, 1:31 p.m.