sdc.trial: Performe statstical disclosure control process on demgraphic...

Description Usage Arguments Details Value Examples

View source: R/sdc-anonymisation.R

Description

The sdc.trial is a function called by anonymisation(). However it is also a very useful function for tuning the variables, such as k and l, when the data is new. This function will return the anonymised demographic data and a SDC object which tells more about the entire anonymisation process, such as individual risks, information lost and so on. The user should balance the security and usefulness based on data anonymisation SOP.

Usage

1
2
sdc.trial(ccd, conf, remove.alive = T, verbose = F, k.anon = 5,
  l.div = NULL)

Arguments

ccd

the identifiable ccRecord object

conf

the YAML file location or a list equivalent to the YAML file.

remove.alive

logical whether remove all non-dead episodes.

verbose

logical showing more or less information.

k.anon

minimum K-anonymity threshold.

l.div

minimum l-diversity threshold.

Details

1. Remove the alive episode if remove.alive variable is TRUE.

2. Calculate the age based on the date of birth (DOB) and date of ICU admission (DAICU) and date of admission will be removed subsequently. The Removal of DAICU should be specified in the configuration file.

3. All demographic time stamps will be converted based on their difference between the date of admission and date of admission will be converted to an arbitrary time 1970-01-01. i.e. admission date: 2014-01-01 -> 1970-01-01; discharge date: 2014-01-03 -> 1970-01-03. With this process, all the time information will be hidden from the users. However the cadence of such of length of stay will still be preserved.

4. Remove episodes which stays longer than a certain period of time. One can specify it in the configuration file, e.g. maxStay: 30.

5. Micro-aggregate the numerical/date variables specified in the configuration file.

6. Special aggregation, such as we suppress the post code in such a way that NW1 1AA -> NW1. The function can be written in the configuration file.

7. Suppress the key variables where the k-anonymity is violated.

8. Suppress the sensitive variables where the l-diversity is violated.

9. Adding noise to the selected data.

Value

sdc a list contains the parsed data and the sdcMicro object where the detailed individual risks can be checked.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
## Not run: 
# We assume the original dataset is called `ccd`

# Create a template configuration file, modify it if necessary. 
template.conf("test.yaml")

# Trial: adjust K-anonymity and L-diversity. 
sdc <- sdc-trial(ccd, "test.yaml", k.anon=10, l.div=2)

# To access SDC object
sdc$sdc

# Create the data extract after the k.anon and l.div is decided. 
ccd.anon <- anonymisation(ccd, "test.yaml", k.anon=10)

## End(Not run)

CC-HIC/ccanonym documentation built on May 6, 2019, 9:23 a.m.