deid_dua: Convert identifying variable to unique hash

Description Usage Arguments Examples

View source: R/deidentify.R

Description

Convert a column of unique but restricted IDs into a set of new IDs using secure (SHA-2) hashing algorithm. Users have the option of saving a crosswalk between the old and new IDs in case observations need to reidentified at a later date.

Usage

1
2
3
4
5
6
7
8
9
deid_dua(
  df,
  id_col = NULL,
  new_id_name = "id",
  id_length = 64,
  existing_crosswalk = NULL,
  write_crosswalk = FALSE,
  crosswalk_filename = NULL
)

Arguments

df

Data frame

id_col

Column name with IDs to be replaced. By default it is NULL and uses the value set by the id_column argument in set_dua_level() function.

new_id_name

New hashed ID column name, which must be different from old name.

id_length

Length of new hashed ID; cannot be fewer than 12 characters (default is 64 characters).

existing_crosswalk

File name of existing crosswalk. If existing crosswalk is used, then new_id_name, id_length, id_length, and crosswalk_name will be determined by the already existing crosswalk. Arguments given for these values will be ignored.

write_crosswalk

Write crosswalk between old ID and new hash ID to console (unless crosswalk_name is given value).

crosswalk_filename

Name of crosswalk file with path; defaults to generic name with current date (YYYYMMDD) appended.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
## --------------
## Setup
## --------------
## set DUA crosswalk
dua_cw <- system.file('extdata', 'dua_cw.csv', package = 'duawranglr')
set_dua_cw(dua_cw)
## read in data
admin <- system.file('extdata', 'admin_data.csv', package = 'duawranglr')
df <- read_dua_file(admin)
## --------------

## show identified data
df

## deidentify
df <- deid_dua(df, id_col = 'sid', new_id_name = 'id', id_length = 12)

## show deidentified data
df

## Not run: 
## save crosswalk between old and new ids for future
deid_dua(df, write_crosswalk = TRUE)

## use existing crosswalk (good for panel datasets that need link)
deid_dua(df, existing_crosswalk = './crosswalk/master_crosswalk.csv')

## End(Not run)

duawranglr documentation built on April 15, 2021, 5:06 p.m.