gen_cart: Generate synthetic data using CART.

Description Usage Arguments Value Examples

View source: R/gen_CART.R

Description

gen_cart uses Classification and Regression Trees (CART) to generate synthetic data by sequentially predicting the value of each variable depending on the value of other variables. Details can be found in syn.

Usage

1
gen_cart(training_set, structure = NA)

Arguments

training_set

A data frame of the training data. The generated data will have the same size as the training_set.

structure

A string of the relationships between variables from modelstring. If structure is NA, the default structure would be the sequence of the variables in the training_set data frame.

Value

The output is a list of three objects: i) structure: the dependency/relationship between the variables (a bn-class object); ii) fit_model: the fitted CART model ((a syn) object and iii) gen_data: the generated synthetic data.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
adult_data <- split_data(adult[1:100,], 70)
cart <- gen_cart(adult_data$training_set)
bn_structure <- "[native_country][income][age|marital_status:education]"
bn_structure = paste0(bn_structure, "[sex][race|native_country][marital_status|race:sex]")
bn_structure = paste0(bn_structure,"[relationship|marital_status][education|sex:race]")
bn_structure = paste0(bn_structure,"[occupation|education][workclass|occupation]")
bn_structure = paste0(bn_structure,"[hours_per_week|occupation:workclass]")
bn_structure = paste0(bn_structure,"[capital_gain|occupation:workclass:income]")
bn_structure = paste0(bn_structure,"[capital_loss|occupation:workclass:income]")
cart_elicit <- gen_cart(adult_data$training_set, bn_structure)

sdglinkage documentation built on April 27, 2020, 5:09 p.m.