deidentify: Deidentify a dataset.
In wilkox/deidentifyr: Deidentify Datasets

'deidentify()' will generate a unique ID from personally identifying information. Because the IDs are generated with the SHA-256 algorithm, they are a) very unlikely to be the same for people with different identifying information, and b) nearly impossible to recover the identifying information from.

1 2	deidentify(data, ..., salt = NULL, key = "id", drop = TRUE, warn_duplicates = TRUE)

`data`	A data frame (or tibble).
`...`	A list of the columns in 'data' that contain personally identifying information, from which the unique IDs will be generated.
`salt`	An optional salt (see Details).
`key`	The name of the column to create containing unique IDs, "id" by default.
`drop`	A logical value, TRUE by default, indicating whether to remove the personally identifying columns after the IDs are created.
`warn_duplicates`	A logical value, TRUE, by default, indicating whether to emit a warning if there are duplicate input rows or produced IDs.

This function uses non-standard evaluation for column names in 'data', so there's no need to surround them with quotation marks.

Optionally, a salt can be added to the personally identifying information. A salt is an extra piece of text, usually kept secret, that will change the resulting IDs. This makes it harder for somebody to re-identify people in the data set by generating IDs from a list of potential inputs. However, you will need to use the same salt every time you deidentify datasets from the same cohort if you want to be able to cross-reference people by ID.

wilkox/deidentifyr documentation built on May 28, 2019, 4:42 p.m.