knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(deident)
Out of the box, deident
features a set of transformations to aid in the de-identification of data sets. Each transformation is implemented via R6Class
and extends BaseDeident
. User defined transformations can be implemented in a similar manner.
To demonstrate the different transformation we supply a toy data set, df
, comprising 26 observations of three variables:
X
if B <= 13
, Y
if B > 13
``` {r, include=F} df <- data.frame( A = letters, B = 1:26, C = sort(rep(c("X", "Y"), 13)) ) df
## Psudonymizer Apply a cached random replacement cipher. Re-occurrence of the same key will receive the same hash. Implemented `deident` options: ``` {r, eval=F} deident(df, "psudonymize", A) deident(df, "Pseudonymizer", A) deident(df, Pseudonymizer, A) deident(df, Pseudonymizer$new(), A) psu <- Pseudonymizer$new() deident(df, psu, A)
By default Pseudonymizer
replaces values in variables with a random alpha-numeric string of 5 characters. This can be replaced via calling set_method
on an instantiated Pseudonymizer with the desired function:
psu <- Pseudonymizer$new() new_method <- function(key, ...){ paste(sample(letters, 12, T), collapse="") } psu$set_method(new_method) deident(df, psu, A)
The first argument to the method receives the key to be transformed.
Implemented deident
options:
``` {r, eval=F} deident(df, "shuffle", A) deident(df, "Shuffler", A) deident(df, Shuffler, A) deident(df, Shuffler$new(), A)
shuffle <- Shuffler$new() deident(df, shuffle, A)
## Encrypter Apply cryptographic hashing to a variable. Implemented `deident` options: ``` {r, eval=F} deident(df, "encrypt", A) deident(df, "Encrypter", A) deident(df, Encrypter, A) deident(df, Encrypter$new(), A) encrypt <- Encrypter$new() deident(df, encrypt, A)
At initialization, Encrypter
can be given hash_key
and seed
values to control the cryptographic encryption. It is recommended users set these values and do not disclose them.
encrypt <- Encrypter$new(hash_key="deident_hash_key_123", seed=202) deident(df, encrypt, A)
Apply Gaussian white noise to a numeric variable.
Implemented deident
options:
``` {r, eval=F} deident(df, "perturb", A) deident(df, "Perturber", A) deident(df, Perturber, A) deident(df, Perturber$new(), A)
perturb <- Perturber$new() deident(df, perturb, A)
### Options At initialization, `Perturber` can be given a scale for the white noise via the `sd` argument. ``` {r} # perturb <- Perturber$new(noise=adaptive_noise(0.2)) # deident(df, perturb, B)
Aggregate categorical values dependent on a user supplied list. the list must be supplied to Blur
at initialization.
Implemented deident
options:
``` {r, eval=F} letter_blur <- c(rep("Early", 13), rep("Late", 13)) names(letter_blur) <- letters
blur <- Blurer$new(blur = letter_blur) deident(df, blur, A)
## NumericBlurer Aggregate numeric values dependent on a user supplied vector of breaks/ cuts. If no vector is supplied `NumericBlurer` defaults to a binary classification about 0. Implemented `deident` options: ``` {r, eval=F} deident(df, "numeric_blur", B) deident(df, "NumericBlurer", B) deident(df, NumericBlurer, B) deident(df, NumericBlurer$new(), B) numeric_blur <- NumericBlurer$new() deident(df, numeric_blur, B)
At initialization NumericBlurer
takes an argument cuts
to define the limits of each interval.
numeric_blur <- NumericBlurer$new(cuts=c(5, 10, 15, 20)) deident(df, numeric_blur, B)
Apply Shuffler
to a data set having first grouped the data on column(s). The grouping needs to be defined at initialization.
Implemented deident
options:
``` {r, eval=F} grouped_shuffle <- GroupedShuffler$new(C) deident(df, grouped_shuffle, B)
### Options At initialization `GroupedShuffler` takes an argument `limit` such that if any aggregated sub group has fewer than `limit` observations all values are dropped. ``` {r} numeric_blur <- GroupedShuffler$new(C, limit=1) deident(df, numeric_blur, B)
Define a column to be removed from the pipeline.
Implemented deident
options:
``` {r, eval=F}
deident(df, Drop, B)
drop <- deident:::Drop$new() deident(df, drop, B) ```
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.