View source: R/generate_tree.R
generate_tree | R Documentation |
generate_tree
uses pair-wise dissimilarity of trees in a random
forest trained with ranger
to generate an artificial most representative tree
which is not part of the original ensemble
generate_tree(
rf,
metric = "weighted splitting variables",
train_data,
test_data = NULL,
dependent_varname,
importance.mode = FALSE,
imp.num.var = NULL,
probs_quantiles = NULL,
epsilon = 0,
min.bucket = 0,
num.splits = NULL,
...
)
rf |
Random forest (rf), object of class |
metric |
Specification of the distance (dissimilarity) metric. Available are "splitting variables", "weighted splitting variables" and "prediction". |
train_data |
Data set for training of artificial representative tree |
test_data |
Additional data set comparable to the data set |
dependent_varname |
Name of the dependent variable used to create the |
importance.mode |
If TRUE variable importance measures will be used to prioritize next split in tree generation. Improves speed. Variable importance values have to be included in ranger object. |
imp.num.var |
Number of variables to be pre selected based on importance values. If "automatic" the Boruta variable selection algorithm from Kursa et al. (2010) is used (could be time consuming). Insert a numeric value, if you want to define the number on your own. |
probs_quantiles |
Vector with values from 0 to 1 or NULL. Possibility to choose quantiles as split points (e.g. c(0.25, 0.5, 0.75)) for continuous variables, otherwise could be very time-consuming. |
epsilon |
The creation of the tree is continued even if the similarity stays the same if the percentage of the prediction improves by 1 - epsilon. |
min.bucket |
Minimal terminal node size. No nodes with less obersavtions smaller than this value can occur. Improves speed. |
num.splits |
The generated tree consists of a maximum of num.splits splits. Improves speed. |
... |
Further parameters passed on to Boruta (e.g. pValue) |
rep.trees |
|
Lea Louisa Kronziel, M.Sc.
require(ranger)
require(timbR)
# Train random forest with ranger
rf.iris <- ranger(Species ~ .,
data = iris,
write.forest=TRUE,
num.trees = 10,
importance = "permutation"
)
# Calculate pair-wise distances for all trees
rep_tree <- generate_tree(rf = rf.iris, metric = "splitting variables", train_data = iris, dependent_varname = "Species", importance.mode = TRUE, imp.num.var = 2, min.bucket = 25)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.