generate_tree: Generate artificial representative tree (art) for a random...

View source: R/generate_tree.R

generate_treeR Documentation

Generate artificial representative tree (art) for a random forest

Description

generate_tree uses pair-wise dissimilarity of trees in a random forest trained with ranger to generate an artificial most representative tree which is not part of the original ensemble

Usage

generate_tree(
  rf,
  metric = "weighted splitting variables",
  train_data,
  test_data = NULL,
  dependent_varname,
  importance.mode = FALSE,
  imp.num.var = NULL,
  probs_quantiles = NULL,
  epsilon = 0,
  min.bucket = 0,
  num.splits = NULL,
  ...
)

Arguments

rf

Random forest (rf), object of class ranger used with write.forest = TRUE to generate tree for.

metric

Specification of the distance (dissimilarity) metric. Available are "splitting variables", "weighted splitting variables" and "prediction".

train_data

Data set for training of artificial representative tree

test_data

Additional data set comparable to the data set rf was build on. Only necessary for metric prediction.

dependent_varname

Name of the dependent variable used to create the rf

importance.mode

If TRUE variable importance measures will be used to prioritize next split in tree generation. Improves speed. Variable importance values have to be included in ranger object.

imp.num.var

Number of variables to be pre selected based on importance values. If "automatic" the Boruta variable selection algorithm from Kursa et al. (2010) is used (could be time consuming). Insert a numeric value, if you want to define the number on your own.

probs_quantiles

Vector with values from 0 to 1 or NULL. Possibility to choose quantiles as split points (e.g. c(0.25, 0.5, 0.75)) for continuous variables, otherwise could be very time-consuming.

epsilon

The creation of the tree is continued even if the similarity stays the same if the percentage of the prediction improves by 1 - epsilon.

min.bucket

Minimal terminal node size. No nodes with less obersavtions smaller than this value can occur. Improves speed.

num.splits

The generated tree consists of a maximum of num.splits splits. Improves speed.

...

Further parameters passed on to Boruta (e.g. pValue)

Value

rep.trees

ranger object containing the artificial most representative tree

Author(s)

Lea Louisa Kronziel, M.Sc.

Examples

require(ranger)
require(timbR)

# Train random forest with ranger
rf.iris <- ranger(Species ~ .,
                  data = iris,
                  write.forest=TRUE,
                  num.trees = 10,
                  importance = "permutation"
                  )

# Calculate pair-wise distances for all trees
rep_tree <- generate_tree(rf = rf.iris, metric = "splitting variables", train_data = iris, dependent_varname = "Species", importance.mode = TRUE, imp.num.var = 2, min.bucket = 25)


imbs-hl/timbR documentation built on April 17, 2025, 2:08 p.m.