README.md

anonymizer

CRAN_Status_Badge Downloads from the RStudio CRAN mirror Build Status Build status codecov.io Project Status: Active - The project has reached a stable, usable state and is being actively developed.

anonymizer anonymizes data containing Personally Identifiable Information (PII) using a combination of salting and hashing. You can find quality examples of data anonymization in R here, here, and here.

Installation

You can install the latest development version from CRAN:

install.packages("anonymizer")

Or from GitHub with:

if (packageVersion("devtools") < 1.6) {
  install.packages("devtools")
}
devtools::install_github("paulhendricks/anonymizer")

If you encounter a clear bug, please file a minimal reproducible example on GitHub.

API

anonymizer employs four convenience functions: salt, unsalt, hash, and anonymize.

library(dplyr, warn.conflicts = FALSE)
library(anonymizer)
letters %>% head
#> [1] "a" "b" "c" "d" "e" "f"
letters %>% head %>% salt(.seed = 1)
#> [1] "gjoxfagjoxf" "gjoxfbgjoxf" "gjoxfcgjoxf" "gjoxfdgjoxf" "gjoxfegjoxf"
#> [6] "gjoxffgjoxf"
letters %>% head %>% salt(.seed = 1) %>% unsalt(.seed = 1)
#> [1] "a" "b" "c" "d" "e" "f"
letters %>% head %>% hash(.algo = "crc32")
#> [1] "c0749952" "597dc8e8" "2e7af87e" "b01e6ddd" "c7195d4b" "5e100cf1"
letters %>% head %>% salt(.seed = 1) %>% hash(.algo = "crc32")
#> [1] "b0891ad8" "361d6876" "fd41bbd3" "e0448b6b" "2b1858ce" "ad8c2a60"
letters %>% head %>% anonymize(.algo = "crc32", .seed = 1)
#> [1] "b0891ad8" "361d6876" "fd41bbd3" "e0448b6b" "2b1858ce" "ad8c2a60"

Generate data containing fake PII

library(generator)
n <- 6
set.seed(1)
ashley_madison <- 
  data.frame(name = r_full_names(n), 
             snn = r_national_identification_numbers(n), 
             dob = r_date_of_births(n), 
             email = r_email_addresses(n), 
             ip = r_ipv4_addresses(n), 
             phone = r_phone_numbers(n), 
             credit_card = r_credit_card_numbers(n), 
             lat = r_latitudes(n), 
             lon = r_longitudes(n), 
             stringsAsFactors = FALSE)
knitr::kable(ashley_madison, format = "markdown")
name snn dob email ip phone credit_card lat lon Eldridge Pfannerstill 442-34-5338 1993-04-28 ntakqojv@lgbcyk.rkv 45.84.71.225 6794976958 4125-7204-9193-5140 -2.7018575 8.634988 Augustine Homenick 799-44-6396 1912-09-08 iqg@mtcuh.viy 191.116.55.106 3275827694 2182-5994-2283-9486 -70.4148630 -65.827918 Jennie Runte 941-11-5441 1985-01-12 wjszy@sjhreocvt.gbp 27.128.73.17 7419351735 4370-4866-4735-7857 -45.4091701 -79.932229 Araceli Kunde 290-44-2675 1948-04-28 uljsnvhfr@qfdkumtn.jkd 221.47.229.86 3243246285 6682-5074-2898-9396 -0.2673845 103.514583 Josue Rau 686-88-8446 1996-06-14 c@lqxzkdpi.nfy 157.136.114.185 9169736873 4510-3757-4858-5236 -22.8839925 72.886505 Elnora Zemlak 212-40-7016 1976-01-09 capvnl@nympzf.gsk 143.20.199.87 3295843196 7206-6205-2194-6432 78.2444466 -120.590050

Detect data containing PII

library(detector)
ashley_madison %>% 
  detect %>% 
  knitr::kable(format = "markdown")
column_name has_email_addresses has_phone_numbers has_national_identification_numbers name FALSE FALSE FALSE snn FALSE FALSE TRUE dob FALSE FALSE FALSE email TRUE FALSE FALSE ip FALSE FALSE FALSE phone FALSE TRUE FALSE credit_card FALSE FALSE FALSE lat FALSE TRUE FALSE lon FALSE TRUE FALSE

Anonymize data containing PII

ashley_madison[] <- lapply(ashley_madison, anonymize, .algo = "crc32")
ashley_madison %>% 
  knitr::kable(format = "markdown")

| name | snn | dob | email | ip | phone | credit_card | lat | lon | |:---------|:---------|:---------|:---------|:---------|:---------|:-------------|:---------|:---------| | c83b4030 | 393d73d7 | 18fe3e97 | aa5dead | e4b6e2c6 | d3af086b | cb7b5ba | 80064d9e | 7dc18006 | | 98a6974d | 70ac65b0 | bf8857eb | a75947f0 | 5e0e7cef | 5c562036 | 7cd11025 | fdf9526d | 5828b961 | | 77dcbc4d | 391740d7 | f0b13e46 | 6cefaee2 | fbaaa8f1 | 9a66f57d | 299a42fe | 734886e3 | 9ea0e9a5 | | a48e2b0b | 6704117d | 2e40fae7 | e1598468 | b7a422ba | 1f0a0373 | f420590f | 53155b41 | 81018fc | | 4fecaeb2 | 9d6bf732 | e881bbe7 | 4b412ff9 | d1f2740c | ac553e93 | e3716031 | f3d9a005 | ef3bdb8d | | abc3b85c | 90866189 | 6cefc2f4 | f26e84b1 | 52596e0e | b14fa5df | 9189fc4f | 85c69f65 | f0db3bb0 |

Citation

To cite package ‘anonymizer’ in publications use:

Paul Hendricks (2015). anonymizer: Anonymize Data Containing Personally Identifiable Information. R package version 0.2.0. https://github.com/paulhendricks/anonymizer

A BibTeX entry for LaTeX users is

@Manual{,
  title = {anonymizer: Anonymize Data Containing Personally Identifiable Information},
  author = {Paul Hendricks},
  year= {2015}, 
  note = {R package version 0.2.0},
  url = {https://github.com/paulhendricks/anonymizer},
}


paulhendricks/anonymizer documentation built on May 24, 2019, 8:27 p.m.