trace_plot: Create a trace plot of trees from a random forest

View source: R/trace_plot.R

trace_plotR Documentation

Create a trace plot of trees from a random forest

Description

Trace plots are useful tools for visually comparing trees from a random forest. This functions creates a trace plot given a set of trees from a random forest fit using the randomForest package. For more information on trace plots, see \insertCiteurbanek:2008;textualTreeTracer.

Usage

trace_plot(
  rf,
  train,
  tree_ids,
  width = 0.8,
  alpha = 0.5,
  tree_color = "black",
  color_by_id = FALSE,
  facet_by_id = FALSE,
  id_order = NULL,
  split_var_order = "rf_vi",
  cont_var = NULL,
  nrow = NULL,
  max_depth = NULL,
  rep_tree = NULL,
  rep_tree_size = 1,
  rep_tree_color = "blue",
  rep_tree_alpha = 1
)

Arguments

rf

random forest model fit using randomForest

train

features used to train the random forest which the tree is from

tree_ids

vector of numbers specifying the trees to include in the trace plot

width

specifies the width of the horizontal feature lines in a trace plot (a number between 0 and 1; default is 0.8)

alpha

alpha value for the lines in the trace plot (a number between 0 and 1; default is 0.5)

tree_color

color of the traces (default is "black")

color_by_id

should the trace lines be colored by the tree IDs? (default if FALSE)

facet_by_id

should the traces be faceted by tree IDs? (default if FALSE)

id_order

order trees should be arranged by if facet_by_id is TRUE (optional)

split_var_order

order of the split variables on the x-axis (left to right) specified either manually as a vector of variable names or as "rf_vi" to indicate that the variables should be ordered by random forest variable importance (default is "rf_vi")

cont_var

continuous variable associated with the trees which can be used to color them (must be in the same order as tree_ids) (optional)

nrow

number of rows if facet_by_id is TRUE (otherwise ignored)

max_depth

the deepest depth to include in the trace plot (set to NULl by default)

rep_tree

option to add a "representative tree" on top of the trace plot by providing a data frame with the structure of the get_tree_data function (NULL by default)

rep_tree_size

line size of "representative tree" (1 by default)

rep_tree_color

line color of "representative tree" ("blue" by default)

rep_tree_alpha

line alpha of "representative tree" (1 by default)

References

\insertRef

urbanek:2008TreeTracer

Examples


# Load packages
library(dplyr)
library(palmerpenguins)

# Load the Palmer penguins data
penguins <- na.omit(penguins)

# Fit a random forest
set.seed(71)
penguin_rf <-
  randomForest::randomForest(
    species ~ bill_length_mm + bill_depth_mm + flipper_length_mm + body_mass_g,
    data = penguins
  )

# Generate a trace plot of the first 10 trees in the forest
trace_plot(
 rf = penguin_rf,
 train = penguins %>% select(bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g),
 tree_ids = 1:10
)

goodekat/TreeTracer documentation built on April 19, 2023, 7:44 p.m.