generate_tree: Generate artificial representative tree (art) for a random...
In imbs-hl/timbR: Tree interpretation methods based on ranger

generate_tree

R Documentation

Generate artificial representative tree (art) for a random forest

Description

generate_tree uses pair-wise dissimilarity of trees in a random forest trained with ranger to generate an artificial most representative tree which is not part of the original ensemble

Usage

generate_tree(
  rf,
  metric = "weighted splitting variables",
  train_data,
  test_data = NULL,
  dependent_varname,
  importance.mode = FALSE,
  imp.num.var = NULL,
  probs_quantiles = NULL,
  epsilon = 0,
  min.bucket = 0,
  num.splits = NULL,
  ...
)

Arguments

`rf`	Random forest (rf), object of class `ranger` used with `write.forest = TRUE` to generate tree for.
`metric`	Specification of the distance (dissimilarity) metric. Available are "splitting variables", "weighted splitting variables" and "prediction".
`train_data`	Data set for training of artificial representative tree
`test_data`	Additional data set comparable to the data set `rf` was build on. Only necessary for `metric` prediction.
`dependent_varname`	Name of the dependent variable used to create the `rf`
`importance.mode`	If TRUE variable importance measures will be used to prioritize next split in tree generation. Improves speed. Variable importance values have to be included in ranger object.
`imp.num.var`	Number of variables to be pre selected based on importance values. If "automatic" the Boruta variable selection algorithm from Kursa et al. (2010) is used (could be time consuming). Insert a numeric value, if you want to define the number on your own.
`probs_quantiles`	Vector with values from 0 to 1 or NULL. Possibility to choose quantiles as split points (e.g. c(0.25, 0.5, 0.75)) for continuous variables, otherwise could be very time-consuming.
`epsilon`	The creation of the tree is continued even if the similarity stays the same if the percentage of the prediction improves by 1 - epsilon.
`min.bucket`	Minimal terminal node size. No nodes with less obersavtions smaller than this value can occur. Improves speed.
`num.splits`	The generated tree consists of a maximum of num.splits splits. Improves speed.
`...`	Further parameters passed on to Boruta (e.g. pValue)

Value

rep.trees

ranger object containing the artificial most representative tree

Author(s)

Lea Louisa Kronziel, M.Sc.

Examples

require(ranger)
require(timbR)

# Train random forest with ranger
rf.iris <- ranger(Species ~ .,
                  data = iris,
                  write.forest=TRUE,
                  num.trees = 10,
                  importance = "permutation"
                  )

# Calculate pair-wise distances for all trees
rep_tree <- generate_tree(rf = rf.iris, metric = "splitting variables", train_data = iris, dependent_varname = "Species", importance.mode = TRUE, imp.num.var = 2, min.bucket = 25)

imbs-hl/timbR documentation built on April 17, 2025, 2:08 p.m.