compare_sdg: Compare the performance of generators.

Description Usage Arguments Details Value Examples

View source: R/compare_sdg.R

Description

compare_sdg compares the preditive performance of models trained by synthetic data with model trained by real data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
compare_sdg(
  learner,
  measurement,
  target_var,
  real_dataset,
  generated_data1,
  generated_data2 = NA,
  generated_data3 = NA,
  generated_data4 = NA,
  generated_data5 = NA,
  generated_data6 = NA
)

Arguments

learner

A learner object from makeLearners.

measurement

A list of performance measurements for benchmark.

target_var

A string of the response variable name.

real_dataset

A list of data frames with a training_set data frame and a testing_set data frame. You can get this list from split_data.

generated_data1

A data frame of synthetic data 1.

generated_data2

A data frame of synthetic data 2.

generated_data3

A data frame of synthetic data 3.

generated_data4

A data frame of synthetic data 4.

generated_data5

A data frame of synthetic data 5.

generated_data6

A data frame of synthetic data 6.

Details

This function returns the measured performance of predictive models trained by the synthetic data. We assume good quality synthetic data would allow us to draw the same analytic conclusions as we can draw from real data. Hence, we compare the predictive performance of several machine learning algorithms that are trained with the synthetic data and tested by real data with those trained and tested both by real data.

Value

The output is a benchmark object. It compares the the preditive performance of selected models trained by the real data and validated by the testing data with models trained by the generated data and validated by the testing data.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
library(mlr)
adult_data <- adult[c('age', 'race', 'sex', 'capital_gain', 'capital_loss', 'hours_per_week',
                      'income')]
adult_data <- split_data(adult_data[1:100,], 70)
bn_learn <- gen_bn_learn(adult_data$training_set, "hc")
lrns <- makeLearners(c("rpart", "logreg"), type = "classif",predict.type = "prob")
measurements <- list(acc, ber)
bmr <- compare_sdg(lrns,
    measurement = measurements,
    target_var = "income",
    real_dataset = adult_data,
    generated_data1 = bn_learn$gen_data)
names(bmr$results) <- c("real_dataset","bn_learn")
bmr

sdglinkage documentation built on April 27, 2020, 5:09 p.m.