synthetic_forest: Grow a tree ensemble on synthetic data

Description Usage Arguments Details Value References Examples

View source: R/synthetic_forest.R

Description

Builds a random forest model to classify actual vs synthetic data where synthetic data is created by sampling each covariate as suggested in Understanding random forests by Brieman.

Usage

1
2
synthetic_forest(dataset, prop = TRUE, seed = 1L,
  implementation = "ranger", ...)

Arguments

dataset

A dataframe

prop

(flag) Random sampling of covariates (when prop = TRUE) to generate synthetic data. Else, uniform sampling is used.

seed

(a positive integer) Seed for sampling.

implementation

(string) Implemenation to use to build the model. The following are supported: 'ranger', 'randomForest'.

...

Arguments to be passed to implementation.

Details

Understanding random forests by Brieman involves creating synthetic data by sampling randomly from unvariate distributions of each covariate(feature). This supports two methods: First, where proportions or distribution is taken into account when sampling at random, second where the data is sampled assuming uniform distribution. The former corresponds to "Addcl1" from Horvath's paper and latter corresponds to "addc2". A random forest model is built using ranger or randomForest to learn what separates the actual data from the synthetic data. Default value of number of trees grown is 1000 and minimum node size to split is set to 5.

Value

A tree ensemble with one these classes: 'ranger', 'randomForest'

References

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# ranger
model_ranger <- synthetic_forest(iris, implementation = "ranger")
oob_error(model_ranger)

# randomForest
model_rf <- synthetic_forest(iris, implementation = "randomForest")
oob_error(model_rf)

# extratrees
model_et <- synthetic_forest(iris, implementation = "ranger", splitrule = "extratrees")
oob_error(model_et)

talegari/forager documentation built on May 3, 2019, 4:01 p.m.