get_decision_path: Determine decision path down tree

Description Usage Arguments Value Examples

View source: R/get_decision_path.R

Description

This function calls choose_split_r which does the work to find the decision path for the given observation. get_decision_path reformats the output into a single data.frame and adds the contribution (contrib column) for each node.

Usage

1
get_decision_path(pretty_tree, model, pred_row, verbose)

Arguments

model

gbm.object

pred_row

single data.frame row (containing explanatory columns) to send down the tree to a terminal node

verbose

should split decisions be printed to console? Default value is FALSE.

prett_tree

data.frame output from pretty.gbm.tree giving tree structure.

Value

data.frame showing split decisions as well as contribution to predicted value for each node visited by the given observation en route to a terminal node. Contains columns;

node_index

index of node observation has passed through

variable

name of the splitting variable (NA for terminal nodes)

type

type for splitting variable, if type > 0 then the variable is categorical otherwise it is ordered or continuous (NA for terminal nodes)

direction

child node to travel down from current node

prediction

prediction at current node

contrib

current node contribution to the terminal node prediction for the given observation in the given tree. Note, for terminal nodes the contribution is the prediction for the first node in the tree. This is counted as part of the bias for the overall model.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
N <- 1000
X1 <- runif(N)
X2 <- 2*runif(N)
X3 <- ordered(sample(letters[1:4],N,replace=TRUE),levels=letters[4:1])
X4 <- factor(sample(letters[1:6],N,replace=TRUE))
X5 <- factor(sample(letters[1:3],N,replace=TRUE))
X6 <- 3*runif(N) 
mu <- c(-1,0,1,2)[as.numeric(X3)]

SNR <- 10 # signal-to-noise ratio
Y <- X1**1.5 + 2 * (X2**.5) + mu
sigma <- sqrt(var(Y)/SNR)
Y <- Y + rnorm(N,0,sigma)

# introduce some missing values
X1[sample(1:N,size=500)] <- NA
X4[sample(1:N,size=300)] <- NA

data <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6)

# fit initial model
gbm1 <- gbm(Y~X1+X2+X3+X4+X5+X6,        
           data=data,                  
           var.monotone=c(0,0,0,0,0,0),
           distribution="gaussian",   
           n.trees=1000,     
           shrinkage=0.05,  
           interaction.depth=3,
           bag.fraction = 0.5,
           train.fraction = 0.5)
           
 get_decision_path(pretty_tree = pretty.gbm.tree(gbm1, 1), 
                   model = gbm1,
                   pred_row = data[1, ])          

richardangell/GbmExplainR documentation built on May 22, 2019, 12:54 p.m.