In goodekat/TreeTracer: Trace Plots Using ggplot2

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Let:

$T_1$ and $T_2$ be two trees trained using $(y_i,\textbf{x}_i)$ for $i=1,...,n$
$\textbf{x}i=(x{i1},...,x_{ik})$ a vector of $k$ covariates for observation $i$
$b_1$ and $b_2$ sets of terminal nodes corresponding to $T_1$ and $T_2$, respectively

Chipman, George, and McCulloch (1998)

Fit Metric

The fit metric is defined as

$$d\left(T_1,T_2\right)=\frac{1}{n}\sum_{i=1}^n m\left(\hat{y}{i1},\hat{y}{i2}\right)$$

where:

$\hat{y}_{ij}$ is a fitted value for tree $j$ (e.g. mean or class label)
$m$ is a metric - for example...
for a regression tree

$$m\left(y_1,y_2\right)=\left(y_1-y_2\right)^2$$ - for a classification tree

$$m\left(y_1,y_2\right)=\begin{cases} 1 & \mbox{if} \ \ y_1=y_2 \ 0 & \mbox{o.w.} \end{cases}$$

Partition Metric

The partition metric is defined as

$$d\left(T_1, T_2\right)=\frac{\sum_{i>k}\left|I_1(i,k)-I_2(i,k)\right|}{n\choose2}$$

where:

$$I_1(i,k)=\begin{cases} 1 & \mbox{if } T_1 \mbox{ places observations } i \mbox{ an } k \mbox{ in the same terminal node} \ 0 & \mbox{o.w.} \end{cases}$$ Note: The metric is scaled to the range of (0,1) by $n\choose2$.

Tree Metric

A metric from Shannon and Banks (1998): Define the tree metric as

$$d(T_1,T_2)=\sum_{r \ \in\ \mbox{nodes}(T_1,T_2)}\alpha_rm\left(\mbox{rule}(T_1,r),\mbox{rule}(T_2,r)\right)$$

where

$r$ is the node position (can be a terminal node in at least one position)
$m$ is a measure that compares the rules at two nodes
$rule$ is the "rule" at a node in the tree such as the feature used for splitting at the node
$\alpha_r$ is a weight chosen by the user

Shannon and Banks (1998) let

$$m=\begin{cases} 1 & \mbox{if the variables at node } r \mbox{ are the same in both trees} \ 0 & \mbox{o.w.} \end{cases}$$