randomize: Create Column-Wise Randomized Test Data for Non-Statistical...
In jesse-smith/coviData: COVID-19 Data Munging Tools for the Shelby County Health Department

randomize

R Documentation

Create Column-Wise Randomized Test Data for Non-Statistical Validation

Description

randomize() draws n samples from the unique values in each column of a data frame and returns the randomized data. This destroys all statistical information in the dataset, both univariate and multivariate. However, since the set of possible output values is the same as the input values, the minimum and maximum of numeric columns will be the same, as will the unique values of all columns (if n is larger than the number of observations).

Usage

randomize(.data, n = NULL, .groups = NULL)

Arguments

`.data`	A data frame or data frame extension (e.g. a `tibble`)
`n`	The desired number of observations in the returned dataset; the default is the number of observations in the input
`.groups`	How to handle grouping variables; see the `.groups` parameter documentation in `summarize()` for more information

Details

randomize() can perform up- and down-sampling of the input data. Downsampling is simple random sampling without replacement. Upsampling samples without replacement up to the size of the input data, then samples with replacement for the remaining observations. This approach ensures that all values from the input data appear at least once if n is greater than or equal to the number of observations in the data.

A stratified version that restricts randomization to occur within strata can be obtained by grouping the data using group_by() prior to calling randomize(). In this case, the relative proportions of the groups within the dataset remain the same; this allows the user to retain portions of the data's structure while destroying the remaining information.

Note that the above only provides anonymity when the number of unique values for quasi-identifiers (within each group) is large and unique identifiers are handled separately. Also note that when groups are defined, information both within and between grouping variables will not be modified.

Value

A tibble containing the randomized test data

jesse-smith/coviData documentation built on Jan. 14, 2023, 11:08 a.m.

jesse-smith/coviData index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jesse-smith/coviData
COVID-19 Data Munging Tools for the Shelby County Health Department

randomize: Create Column-Wise Randomized Test Data for Non-Statistical...
In jesse-smith/coviData: COVID-19 Data Munging Tools for the Shelby County Health Department

Create Column-Wise Randomized Test Data for Non-Statistical Validation

Description

Usage

Arguments

Details

Value

Related to randomize in jesse-smith/coviData...

R Package Documentation

Browse R Packages

We want your feedback!

jesse-smith/coviData COVID-19 Data Munging Tools for the Shelby County Health Department

randomize: Create Column-Wise Randomized Test Data for Non-Statistical... In jesse-smith/coviData: COVID-19 Data Munging Tools for the Shelby County Health Department

Create Column-Wise Randomized Test Data for Non-Statistical Validation

Description

Usage

Arguments

Details

Value

Related to randomize in jesse-smith/coviData...

R Package Documentation

Browse R Packages

We want your feedback!

jesse-smith/coviData
COVID-19 Data Munging Tools for the Shelby County Health Department

randomize: Create Column-Wise Randomized Test Data for Non-Statistical...
In jesse-smith/coviData: COVID-19 Data Munging Tools for the Shelby County Health Department