synthesize: Create synthetic version of a dataset

View source: R/synthesizer.R

synthesizeR Documentation

Create synthetic version of a dataset

Description

Create n values or records based on the emperical (multivariate) distribution of y. For data frames it is possible to decorrelate synthetic from the original variables by lowering the value for the rankcor parameter.

Usage

synthesize(x, n = NROW(x), rankcor = 1)

Arguments

x

[vector|data.frame] data to synthesize.

n

[integer] Number of values or records to synthesize.

rankcor

[numeric] in [0,1]. Either a single rank correlation value that is applied to all variables, or a vector of the form c(variable1=ut1lity1,...). Variables not explicitly mentioned will have rankcor=1. See also the note below.

Value

A data object of the same type and structure as x.

Note

The utility of a synthetic variable is lowered by decorelating the rank correlation between the real and synthetic data. If rankcor=1, the synthetic data will ordered such that it has the same rank order as the original data. If rankcor=0, no such reordering will take place. For values between 0 and 1, blocks of data are randomly selected and randomly permuted iteratively until the rank correlation between original and synthetic data drops below the parameter.

See Also

Other synthesis: make_synthesizer()

Examples

synthesize(cars$speed,10)
synthesize(cars)
synthesize(cars,25)

s1 <- synthesize(iris, rankcor=1)
s2 <- synthesize(iris, rankcor=0.5)
s3 <- synthesize(iris, rankcor=c("Species"=0.5))

oldpar <- par(mfrow=c(2,2), pch=16, las=1)
plot(Sepal.Length ~ Sepal.Width, data=iris, col=iris$Species, main="Iris")
plot(Sepal.Length ~ Sepal.Width, data=s1, col=s1$Species, main="Synthetic Iris")
plot(Sepal.Length ~ Sepal.Width, data=s2, col=s2$Species, main="Low utility Iris")
plot(Sepal.Length ~ Sepal.Width, data=s3, col=s3$Species, main="Low utility Species")
par(oldpar)


synthesizer documentation built on April 12, 2025, 2:27 a.m.