disclosure | R Documentation |
Calculates disclosure measures for synthetic data. NOTE: The other function that calculates disclosure results for multiple targets has been renamed as multi.disclosure from disclosure.summary.
## S3 method for class 'synds'
disclosure(object, data, keys , target , print.flag = TRUE,
denom_lim = 5, exclude_ov_denom_lim = FALSE, not.targetlev = NULL,
usetargetNA = TRUE, usekeysNA = TRUE,
exclude.keys =NULL, exclude.keylevs = NULL, exclude.targetlevs = NULL,
ngroups_target = NULL, ngroups_keys = NULL,
thresh_1way = c(50, 90),thresh_2way = c(4, 80),
digits = 2, to.print =c("short"),...)
## S3 method for class 'data.frame'
disclosure(object, data,cont.na = NULL, keys , target , print.flag = TRUE,
denom_lim = 5, exclude_ov_denom_lim = FALSE,
not.targetlev = NULL,
usetargetNA = TRUE, usekeysNA = TRUE,
exclude.keys =NULL, exclude.keylevs = NULL, exclude.targetlevs = NULL,
ngroups_target = NULL, ngroups_keys = NULL,
thresh_1way = c(50, 90),thresh_2way = c(4, 80),
digits = 2, to.print =c("short"), compare.synorig = TRUE, ...)
## S3 method for class 'list'
disclosure(object, data,cont.na = NULL, keys , target , print.flag = TRUE,
denom_lim = 5, exclude_ov_denom_lim = FALSE,
not.targetlev = NULL,
usetargetNA = TRUE, usekeysNA = TRUE,
exclude.keys =NULL, exclude.keylevs = NULL, exclude.targetlevs = NULL,
ngroups_target = NULL, ngroups_keys = NULL,
thresh_1way = c(50, 90),thresh_2way = c(4, 80),
digits = 2, to.print =c("short"), compare.synorig = TRUE, ...)
## S3 method for class 'disclosure'
print(x, to.print =NULL, digits = NULL, ...)
object |
an object of class |
data |
the original (observed) data set. |
cont.na |
For data NOT supplied as a synthetic data object created by
|
keys |
vector of variable names or column numbers in data that are also present in the synthetic data to act as quasi-identifiers for identity or attribute disclosure. |
target |
name of target variable for attribute disclosure. |
denom_lim |
Limit to use to exclude large key-target group, see next item. |
exclude_ov_denom_lim |
logical to exclude key targetcombinations
that contribute more than |
print.flag |
logical value as to whether a line is printed as disclosure is calculated for each synthetic data set. |
digits |
number of digits to print for disclosure measures. |
usetargetNA |
determines whether NA values in target are to be used in checking for disclosure |
usekeysNA |
determines whether NA values in keys are to be used in checking for disclosure. |
not.targetlev |
Character variable giving level of target to be excluded from disclosure measures. Usually identified by checklev_1way. |
exclude.keys |
vector of names of keys that, with the next two items will define the target and key combinations to be excluded from the calculation of disclosure measures. Often identified by checklev_2way. |
exclude.keylevs |
vector of the same length as exclude.keys that give the levels to be excluded for the corresponding key. |
exclude.targetlevs |
vector of the same length as exclude.keys that give the levels of target that will be excluded for each key and key level. |
ngroups_target |
Unless set to NULL (the default) a numeric target variable
will be grouped into |
ngroups_keys |
Unless set to NULL (the default) any numeric variable
will be grouped into categories. If |
thresh_1way |
A vector of two numeric values both of which meed to be exceeded for warnings about a level of the target that may be dominating the results. The first is the count of all disclosive records for this level of the target, and the second is the % of all original records for this level of the target. Default is c(50, 90), meaning a group of 50 disclosive records for this level of the target where they make up over 90% of all disclosive records. |
thresh_2way |
A vector of two numeric values both of which meed to be exceeded for warnings about a level of the target that may be dominating the results. The first is the count of disclosive records for a quasi-identifier used to identify possible s that are searched for the most disclosive key-target combination. The second is the percentage of all original records for each combination examined that must be exceeded to trigger a warning. Default is c(5, 80), meaning a pairs found from key-target groups of more than 5 records where over 80% of all the original values with these key-target pairs have this level of the target. |
to.print |
Vector to determine what aspects of an object of class disclosure will be printed. Must consist of one or more of the following "short", "ident", "attrib","allCAPs", "all", "check_1way", "check_2way", "exclusions". Default is "short" giving a brief summary. |
compare.synorig |
a logical value to determine if the functions
|
... |
additional parameters |
x |
an object of class |
Calculates identity disclosure measures for a for a set of keys,
(quasi identifiers) and attribute disclosure measures for one
variable from the same set of keys considered as a target. The
function multi.disclosure
calls this function and
summarises the attribute disclosure measures for multiple targets.
See the vignette
An object of class disclosure
which is a list with the following
components.
call |
the call that created the object. |
ident |
Table of measures of identity disclosure one for each synthesis. Measures are "UiO","UiS","UiSiO" and "repU". See vignette disclosure.pdf for an explanation of these and the following measures. |
attrib |
Table of measures of attribute disclosure one for each synthesis. These include "DiO","DiS","iSO","DiSCO" and "DiSDiO". The measures "DiO" and "DiS" are the percentage of the target that are disclosed from the original and synthetic data with these keys. The next measure "iSO" gives the percentage of the key combinations in the synthetic data that are present in the original - one was in which the disclosure. "DiSCO" gives the percentage of original records where the attribution to the target is correct as judged from the original. "DiSDiO" gives the % of origina; records in "DISCO" that are unique in the original data. The table also as gives the maximum and mean of the denominators for the "DiSCO" measure i.e. the distribution for every record that leads to a correct disclosure of the number of observations with the same keys and the same correct target in the synthetic data. Large denominators are often an indication that the disclosure is something that might be expected from prior knowledge of relations. |
allCAPs |
Table of the following measures of correct attribution probability: "baseCAPd","CAPd", "CAPs" , "DCAP" and "TCAP"' |
check_1way |
A data frame with one record per synthesis
identifying the level of the target with numbers of disclosive records
that are above thresholds defined by |
check1 |
The level of the target identified by check_1way ' or blank if none |
check_2way |
A list of length number of syntheses giving details
for each of the two-way combinations of target and keys where the
the numbers of disclosive records are above thresholds defined by
|
Nexclusions |
A list of length number of syntheses with number of records excluded from attribute measures for different reasons. |
keys |
as input |
digits |
as input |
Norig |
Number of records in data |
to.print |
as input |
See package vignette disclosure.pdf for additional information including formal definitions of all quantities and worked examples.
See references in package vignette
syn
multi.disclosure
library(synthpop)
ods <- SD2011[, c("sex", "age", "edu", "marital", "income")]
odsF <- numtocat.syn(ods, numtocat = "income", catgroups = 7, cont.na = list(income = -8))
s1 <- syn(odsF$data, method = "ctree",seed = 75, m=3, k=1000)
disc1 <- disclosure(s1, odsF$data, target = "income",
keys = c("sex", "age", "edu","marital"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.