anonymisation: Anonymise the critical care dataset

Description Usage Arguments Details Value Examples

View source: R/sdc-anonymisation.R

Description

This is the main function of the ccanonym package. It creates the anonymised data abstract from the identifiable dataset through the anonymisation process.

Usage

1
2
anonymisation(ccd, conf, remove.alive = T, verbose = F, k.anon = 5,
  l.div = NULL, ...)

Arguments

ccd

identifiable data set in ccRecord format (see. cleanEHR R package)

conf

YAML configuration which can be either path of the YAML file or a configuration list equivalent to the YAML configuration.

remove.alive

logical value determines whether to remove all alive episodes.

verbose

logical

k.anon

integer the K-anonymity

l.div

integer the minimum L-diversity of the data extract. For entries where the L-diversity is above this threshold will be suppressed.

Details

1. Remove the alive episode if remove.alive variable is TRUE.

2. Calculate the age based on the date of birth (DOB) and date of ICU admission (DAICU) and date of admission will be removed subsequently. The Removal of DAICU should be specified in the configuration file.

3. All demographic time stamps will be converted based on their difference between the date of admission and date of admission will be converted to an arbitrary time 1970-01-01. i.e. admission date: 2014-01-01 -> 1970-01-01; discharge date: 2014-01-03 -> 1970-01-03. With this process, all the time information will be hidden from the users. However the cadence of such of length of stay will still be preserved.

4. Remove VIPs from a list file which has the identifiers of VIPs. The identifiers can be NHS number or PAS number or site episode id combination (Q70:000001).

5. Remove episodes which stays longer than a certain period of time. One can specify it in the configuration file, e.g. maxStay: 30.

6. Micro-aggregate the numerical/date variables specified in the configuration file.

7. Special aggregation, such as we suppress the post code in such a way that NW1 1AA -> NW1. The function can be written in the configuration file.

8. Suppress the key variables where the k-anonymity is violated.

9. Suppress the sensitive variables where the l-diversity is violated.

10. Adding noise to the selected data.

11. Combine and create the new ccRecord object and convert all the 2d date time stamps to the hour difference to the admission time.

Value

ccRecord

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Not run: 
# We assume the original dataset is called `ccd`
# Create a template configuration file, modify it if necessary. 
template.conf("test.yaml")

# Trial: adjust K-anonymity and L-diversity. 
sdc-trial(ccd, "test.yaml", k.anon=10, l.div=2)

# Create the data extract after the k.anon and l.div is decided. 
ccd.anon <- anonymisation(ccd, "test.yaml", k.anon=10)

## End(Not run)

CC-HIC/ccanonym documentation built on May 6, 2019, 9:23 a.m.