transformations"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(deident)

Transformations

Out of the box, deident features a set of transformations to aid in the de-identification of data sets. Each transformation is implemented via R6Class and extends BaseDeident. User defined transformations can be implemented in a similar manner.

To demonstrate the different transformation we supply a toy data set, df, comprising 26 observations of three variables:

``` {r, include=F} df <- data.frame( A = letters, B = 1:26, C = sort(rep(c("X", "Y"), 13)) ) df

## Psudonymizer

Apply a cached random replacement cipher.  Re-occurrence of the same key will receive the same hash.

Implemented `deident` options:

``` {r, eval=F}
deident(df, "psudonymize", A)
deident(df, "Pseudonymizer", A)
deident(df, Pseudonymizer, A)
deident(df, Pseudonymizer$new(), A)

psu <- Pseudonymizer$new()
deident(df, psu, A)

Options

By default Pseudonymizer replaces values in variables with a random alpha-numeric string of 5 characters. This can be replaced via calling set_method on an instantiated Pseudonymizer with the desired function:

psu <- Pseudonymizer$new()

new_method <- function(key, ...){
  paste(sample(letters, 12, T), collapse="")
}

psu$set_method(new_method)

deident(df, psu, A)

The first argument to the method receives the key to be transformed.

Shuffler

Implemented deident options:

``` {r, eval=F} deident(df, "shuffle", A) deident(df, "Shuffler", A) deident(df, Shuffler, A) deident(df, Shuffler$new(), A)

shuffle <- Shuffler$new() deident(df, shuffle, A)

## Encrypter

Apply cryptographic hashing to a variable.

Implemented `deident` options:

``` {r, eval=F}
deident(df, "encrypt", A)
deident(df, "Encrypter", A)
deident(df, Encrypter, A)
deident(df, Encrypter$new(), A)

encrypt <- Encrypter$new()
deident(df, encrypt, A)

Options

At initialization, Encrypter can be given hash_key and seed values to control the cryptographic encryption. It is recommended users set these values and do not disclose them.

encrypt <- Encrypter$new(hash_key="deident_hash_key_123", seed=202)
deident(df, encrypt, A)

Perturber

Apply Gaussian white noise to a numeric variable.

Implemented deident options:

``` {r, eval=F} deident(df, "perturb", A) deident(df, "Perturber", A) deident(df, Perturber, A) deident(df, Perturber$new(), A)

perturb <- Perturber$new() deident(df, perturb, A)

### Options

At initialization, `Perturber` can be given a scale for the white noise via the `sd` argument.  

``` {r}
# perturb <- Perturber$new(noise=adaptive_noise(0.2))
# deident(df, perturb, B)

Blurer

Aggregate categorical values dependent on a user supplied list. the list must be supplied to Blur at initialization.

Implemented deident options:

``` {r, eval=F} letter_blur <- c(rep("Early", 13), rep("Late", 13)) names(letter_blur) <- letters

blur <- Blurer$new(blur = letter_blur) deident(df, blur, A)

## NumericBlurer

Aggregate numeric values dependent on a user supplied vector of breaks/ cuts.  If no vector is supplied `NumericBlurer` defaults to a binary classification about 0.

Implemented `deident` options:

``` {r, eval=F}
deident(df, "numeric_blur", B)
deident(df, "NumericBlurer", B)
deident(df, NumericBlurer, B)
deident(df, NumericBlurer$new(), B)

numeric_blur <- NumericBlurer$new()
deident(df, numeric_blur, B)

Options

At initialization NumericBlurer takes an argument cuts to define the limits of each interval.

numeric_blur <- NumericBlurer$new(cuts=c(5, 10, 15, 20))
deident(df, numeric_blur, B)

GroupedShuffler

Apply Shuffler to a data set having first grouped the data on column(s). The grouping needs to be defined at initialization.

Implemented deident options:

``` {r, eval=F} grouped_shuffle <- GroupedShuffler$new(C) deident(df, grouped_shuffle, B)

### Options

At initialization  `GroupedShuffler` takes an argument `limit` such that if any aggregated sub group has fewer than `limit` observations all values are dropped.

``` {r}
numeric_blur <- GroupedShuffler$new(C, limit=1)
deident(df, numeric_blur, B)

Drop

Define a column to be removed from the pipeline.

Implemented deident options:

``` {r, eval=F}

deident(df, Drop, B)

drop <- deident:::Drop$new() deident(df, drop, B) ```



Try the deident package in your browser

Any scripts or data that you put into this service are public.

deident documentation built on April 3, 2025, 6:14 p.m.