README.md

pinr

Travis build status Coverage status

The goal of pinr is to simplify working with data containing Finnish personal identity codes (PINs). You can:

Installation

Currently you can install the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("fbc-studies/pinr")

Usage

The primary (pipe-friendly) utility function automates pseudonymizing columns containing PINs in your data:

library(pinr)

df <- data.frame(pin = c("311280-888Y", "311280-888Y", "131052-308T"))
key <- data.frame(pin = c("311280-888Y", "131052-308T"), pid = c(1, 2))

pseudonymize(df, key, pid = pin)
#>   pid
#> 1   1
#> 2   1
#> 3   2

The result is equivalent to looking up the pid from the key data frame, but pinr also takes care of managing the columns and naming in your data.

key$pid[match(df$pin, key$pin)]
#> [1] 1 1 2

Rather than manually specifying columns containing PINs, you can also use a heuristic implemented in the is_probably_pin() function to guess which columns need to be pseudonymized:

pseudonymize(df, key, guess = TRUE, replace = FALSE)
#>           pin pin_pid
#> 1 311280-888Y       1
#> 2 311280-888Y       1
#> 3 131052-308T       2

pinr also includes helpers for extracting data contained in the Finnish PINs, such as the date of birth and sex:

pins <- c("311280-888Y", "131052-308T")

pin_dob(pins)
#> [1] "1980-12-31" "1952-10-13"

pin_sex(pins)
#> [1] Female Female
#> Levels: Male Female

There is also a pin_extract() wrapper for these extraction functions that makes it easy to extract these data into new columns in a data frame context:

pin_extract(df, pin)
#>           pin        dob    sex
#> 1 311280-888Y 1980-12-31 Female
#> 2 311280-888Y 1980-12-31 Female
#> 3 131052-308T 1952-10-13 Female

All of the pinr functions that work with data frames are pipe-friendly, lending themselves to readable workflows such as this:

library(magrittr) # for the pipe operator

df %>% 
  pin_extract(pin) %>% 
  pseudonymize(key, pid = pin)
#>   pid        dob    sex
#> 1   1 1980-12-31 Female
#> 2   1 1980-12-31 Female
#> 3   2 1952-10-13 Female


fbc-studies/pinr documentation built on May 17, 2019, 7:35 p.m.