Description Usage Arguments Details Value References Examples
View source: R/synthetic_forest.R
Builds a random forest model to classify actual vs synthetic data where synthetic data is created by sampling each covariate as suggested in Understanding random forests by Brieman.
1 2 | synthetic_forest(dataset, prop = TRUE, seed = 1L,
implementation = "ranger", ...)
|
dataset |
A dataframe |
prop |
(flag) Random sampling of covariates (when prop = TRUE) to generate synthetic data. Else, uniform sampling is used. |
seed |
(a positive integer) Seed for sampling. |
implementation |
(string) Implemenation to use to build the model. The following are supported: 'ranger', 'randomForest'. |
... |
Arguments to be passed to implementation. |
Understanding random forests by Brieman involves creating synthetic data by sampling randomly from unvariate distributions of each covariate(feature). This supports two methods: First, where proportions or distribution is taken into account when sampling at random, second where the data is sampled assuming uniform distribution. The former corresponds to "Addcl1" from Horvath's paper and latter corresponds to "addc2". A random forest model is built using ranger or randomForest to learn what separates the actual data from the synthetic data. Default value of number of trees grown is 1000 and minimum node size to split is set to 5.
A tree ensemble with one these classes: 'ranger', 'randomForest'
Unsupervised Learning With Random Forest Predictors by Tao Shi & Steve Horvath.
Understanding random forests by Brieman.
1 2 3 4 5 6 7 8 9 10 11 | # ranger
model_ranger <- synthetic_forest(iris, implementation = "ranger")
oob_error(model_ranger)
# randomForest
model_rf <- synthetic_forest(iris, implementation = "randomForest")
oob_error(model_rf)
# extratrees
model_et <- synthetic_forest(iris, implementation = "ranger", splitrule = "extratrees")
oob_error(model_et)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.