add_shuffle: De-identification via random sampling
In deident: Persistent Data Anonymization Pipeline

add_shuffle

R Documentation

De-identification via random sampling

Description

add_shuffle() adds a shuffling step to a transformation pipeline. When ran as a transformation, each specified variable undergoes a random sample without replacement so that summary metrics on a single variable are unchanged, but inter-variable metrics are rendered spurious.

Usage

add_shuffle(object, ..., limit = 0)

Arguments

`object`	Either a `data.frame`, `tibble`, or existing `DeidentList` pipeline.
`...`	variables to be transformed.
`limit`	integer - the minimum number of observations a variable needs to have for shuffling to be performed. If the variable has length less than `limit` values are replaced with `NA`s.

Value

A 'DeidentList' representing the untrained transformation pipeline. The object contains fields:

deident_methods a list of each step in the pipeline (consisting of variables and method)

and methods:

mutate apply the pipeline to a new data set
to_yaml serialize the pipeline to a '.yml' file

Examples


# Basic usage; 
pipe.shuffle <- add_shuffle(ShiftsWorked, Employee)
pipe.shuffle$mutate(ShiftsWorked)

pipe.shuffle.limit <- add_shuffle(ShiftsWorked, Employee, limit=1)
pipe.shuffle.limit$mutate(ShiftsWorked)

deident documentation built on April 3, 2025, 6:14 p.m.