deidentify_data: De-identify data through encrypting text columns, grouping...

Description Usage Arguments Value

View source: R/deidentify.R

Description

De-identify data through encrypting text columns, grouping rare values, and, aggregating numeric or date columns.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
deidentify_data(
  data,
  date_cols = NULL,
  date_aggregation = c("week", "month", "bimonth", "quarter", "halfyear", "year"),
  cols_to_encrypt = NULL,
  group_rare_values_cols = NULL,
  group_rare_values_limit = 5,
  group_rare_values_text = NULL,
  quiet = FALSE
)

Arguments

data

A data.frame with the data you want to de-identify.

date_cols

A vector of strings with the name of date columns that you want to be aggregated to the unit set in the date_aggregation parameter. If NULL, will use all date columns in the data.

date_aggregation

A string with the time unit to aggregate all Date variables to. Can take one of the following: 'week', 'month', 'bimonth', 'quarter', 'halfyear', 'year'.

cols_to_encrypt

A string or vector of strings with the columns that you want to encrypt.

group_rare_values_cols

A string or vector of strings with the columns that you want to convert rare values (below a certain percent of all values as set in group_rare_values_limit) into NA or a particular string (or NA) set in group_rare_values_text.

group_rare_values_limit

A string or vector of strings (one for each col in group_rare_values_cols) for what threshold (in percent of all non-NA values) to determine that a value is rare enough to change to NA (or the string set in group_rare_values_text).

group_rare_values_text

A string or vector of strings (one for each col in group_rare_values_cols) for what to rename the values that are determined to be rare enough (based on threshold set in group_rare_values_limit to rename them. If NULL (default), and the vector is strings, replaces them with a string that concatenates all of the rare values together (separated by a comma). If NA, replaces them with NA.

quiet

A Boolean for whether you want to output a message that tells you which columns that you are encrypting and the seed set for each column to do the encryption. If you don't set the seed yourself, you need these seeds to decrypt.

Value

A data.frame with the selected columns de-identify based on user parameters.


phillydao/deidentify documentation built on Feb. 4, 2021, 2:31 p.m.