deidentify: Deidentify a dataset.

Description Usage Arguments Details

Description

'deidentify()' will generate a unique ID from personally identifying information. Because the IDs are generated with the SHA-256 algorithm, they are a) very unlikely to be the same for people with different identifying information, and b) nearly impossible to recover the identifying information from.

Usage

1
2
deidentify(data, ..., salt = NULL, key = "id", drop = TRUE,
  warn_duplicates = TRUE)

Arguments

data

A data frame (or tibble).

...

A list of the columns in 'data' that contain personally identifying information, from which the unique IDs will be generated.

salt

An optional salt (see Details).

key

The name of the column to create containing unique IDs, "id" by default.

drop

A logical value, TRUE by default, indicating whether to remove the personally identifying columns after the IDs are created.

warn_duplicates

A logical value, TRUE, by default, indicating whether to emit a warning if there are duplicate input rows or produced IDs.

Details

This function uses non-standard evaluation for column names in 'data', so there's no need to surround them with quotation marks.

Optionally, a salt can be added to the personally identifying information. A salt is an extra piece of text, usually kept secret, that will change the resulting IDs. This makes it harder for somebody to re-identify people in the data set by generating IDs from a list of potential inputs. However, you will need to use the same salt every time you deidentify datasets from the same cohort if you want to be able to cross-reference people by ID.


wilkox/deidentifyr documentation built on May 28, 2019, 4:42 p.m.