randomize | R Documentation |
randomize()
draws n
samples from the unique values in each column
of a data frame and returns the randomized data. This destroys all
statistical information in the dataset, both univariate and multivariate.
However, since the set of possible output values is the same as the input
values, the minimum and maximum of numeric columns will be the same, as will
the unique values of all columns (if n
is larger than the number of
observations).
randomize(.data, n = NULL, .groups = NULL)
.data |
A data frame or data frame extension (e.g. a
|
n |
The desired number of observations in the returned dataset; the default is the number of observations in the input |
.groups |
How to handle grouping variables; see the |
randomize()
can perform up- and down-sampling of the input data.
Downsampling is simple random sampling without replacement. Upsampling
samples without replacement up to the size of the input data, then samples
with replacement for the remaining observations. This approach ensures that
all values from the input data appear at least once if n
is greater than or
equal to the number of observations in the data.
A stratified version that restricts randomization to occur within strata
can be obtained by grouping the data using
group_by()
prior to calling randomize()
. In
this case, the relative proportions of the groups within the dataset remain
the same; this allows the user to retain portions of the data's structure
while destroying the remaining information.
Note that the above only provides anonymity when the number of unique values for quasi-identifiers (within each group) is large and unique identifiers are handled separately. Also note that when groups are defined, information both within and between grouping variables will not be modified.
A tibble
containing the randomized test data
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.