add_shuffle: De-identification via random sampling

View source: R/api.R

add_shuffleR Documentation

De-identification via random sampling

Description

add_shuffle() adds a shuffling step to a transformation pipeline. When ran as a transformation, each specified variable undergoes a random sample without replacement so that summary metrics on a single variable are unchanged, but inter-variable metrics are rendered spurious.

Usage

add_shuffle(object, ..., limit = 0)

Arguments

object

Either a data.frame, tibble, or existing DeidentList pipeline.

...

variables to be transformed.

limit

integer - the minimum number of observations a variable needs to have for shuffling to be performed. If the variable has length less than limit values are replaced with NAs.

Value

A 'DeidentList' representing the untrained transformation pipeline. The object contains fields:

  • deident_methods a list of each step in the pipeline (consisting of variables and method)

and methods:

  • mutate apply the pipeline to a new data set

  • to_yaml serialize the pipeline to a '.yml' file

See Also

add_group() for usage under aggregation

Examples


# Basic usage; 
pipe.shuffle <- add_shuffle(ShiftsWorked, Employee)
pipe.shuffle$mutate(ShiftsWorked)

pipe.shuffle.limit <- add_shuffle(ShiftsWorked, Employee, limit=1)
pipe.shuffle.limit$mutate(ShiftsWorked)


deident documentation built on April 3, 2025, 6:14 p.m.