Blur Example"
In deident: Persistent Data Anonymization Pipeline

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

One way to reduce identifiability of a data set is by converting a categorical variable to have a more aggregated taxonomy (i.e. a many-to-one mapping). Here we refer to such a method as a 'blur' as it causes features to be joined together in such a way to hide the underlying information.

As an example, consider the ShiftsWorked data:

library(deident)
head(ShiftsWorked)

A simple 'blur' might be to change the taxonomy of 'Shift' e.g. combine 'Day' and 'Night' into a new group 'Working' and ignore the 'Rest' shifts. To do this we define the values we wish to change as a vector, build a pipeline and apply it to the data:

shift_blur <- c("Day" = "Working", "Night" = "Working")
blur_pipe <- ShiftsWorked |>
  add_blur(Shift, blur=shift_blur)

apply_deident(ShiftsWorked, blur_pipe)

The `category_blur` utility

Applying the blur is relatively simple, but constructing it can be complex. Consider the starwars data set supplied by dplyr:

starwars <- dplyr::starwars
head(starwars)

And notably the species variable:

table(starwars$species)

Imagine we wanted to reduce identifiability by aggregating the data into Human vs Non-Human. We could code the vector by hand, but human error can lead to mistakes. To aid in designing complex blurs we supply the category_blur utility which uses regex to define the groups.

human_blur <- category_blur(
  starwars$species,
  "NotHuman" = "^(?!Human)" # Doesn't start with "Human"
)

And the vector returned can be passed into a new pipeline as before.

species_pipe <- starwars |>
  add_blur(species, blur=human_blur)

new_starwars <- apply_deident(starwars, species_pipe)

table(new_starwars$species)

Any scripts or data that you put into this service are public.

deident documentation built on April 3, 2025, 6:14 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

deident
Persistent Data Anonymization Pipeline

Blur Example"
In deident: Persistent Data Anonymization Pipeline

The `category_blur` utility

Try the deident package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

deident Persistent Data Anonymization Pipeline

Blur Example" In deident: Persistent Data Anonymization Pipeline

The category_blur utility

Try the deident package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

deident
Persistent Data Anonymization Pipeline

Blur Example"
In deident: Persistent Data Anonymization Pipeline

The `category_blur` utility