decompose_gbm_prediction: Decompose gbm prediction into feature contributions + bias

Description Usage Arguments Details Value Examples

View source: R/decompose_gbm_prediction.R

Description

For a single observation decompose the prediction for a gbm into feature contributions + bias. Within a single tree, the contribution for a given node is calculated by subtracting the prediction for the current node from the prediction of the next node the observation would visit in the tree. The predicted value for the first node in the tree is combined into the bias term (which also includes the intercept or initF from the model). Node contributions are summed by the split variable for the node, across all trees in the model, giving the observation's prediction represented as bias + contribution for each feature used in the model.

Usage

1
2
decompose_gbm_prediction(gbm, prediction_row, type = "link",
  verbose = FALSE, aggregate_contributions = TRUE, n_trees = NULL)

Arguments

gbm

gbm.object to predict with. Note multinomial distribution gbms not currently supported.

prediction_row

single row data.frame to predict and the decompose into feature contributions

type

either "link" or "response". Default is "link". If "response" and the gbm distribution is "poisson" then contributions are converted to be on the response scale (i.e. counts). For all distributions except "poisson" both options do the same.

verbose

should split decisions be printed to console? Default value is FALSE.

aggregate_contributions

should feature contributions aggregated to variable level be returned? Default is TRUE. The option is there to inspect the contributions at tree x node level, which is mainly used with the validate_decomposition function. Note, if contributions are not aggregated then the model intercept will not be accounted for.

n_trees

the number of trees to use in generating the prediction for the given row. Default NULL uses all trees in the model.

Details

Based on treeinterpreter Python package for random forests; https://github.com/andosa/treeinterpreter.

Value

data.frame containing variable contributions to predicted value.
If aggregate_contributions = TRUE, the contributions are at the variable level;

variable

variable name

contribution

variable contribution to prediction

variable_value

value of variable for input row

variabel_class

class of variable

If aggregate_contributions = FALSE, the contributions are at node x tree level, see output from get_decision_path.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
N <- 1000
X1 <- runif(N)
X2 <- 2*runif(N)
X3 <- ordered(sample(letters[1:4],N,replace=TRUE),levels=letters[4:1])
X4 <- factor(sample(letters[1:6],N,replace=TRUE))
X5 <- factor(sample(letters[1:3],N,replace=TRUE))
X6 <- 3*runif(N) 
mu <- c(-1,0,1,2)[as.numeric(X3)]

SNR <- 10 # signal-to-noise ratio
Y <- X1**1.5 + 2 * (X2**.5) + mu
sigma <- sqrt(var(Y)/SNR)
Y <- Y + rnorm(N,0,sigma)

# introduce some missing values
X1[sample(1:N,size=500)] <- NA
X4[sample(1:N,size=300)] <- NA

data <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6)

# fit initial model
gbm1 <- gbm(Y~X1+X2+X3+X4+X5+X6,        
           data=data,                  
           var.monotone=c(0,0,0,0,0,0),
           distribution="gaussian",   
           n.trees=1000,     
           shrinkage=0.05,  
           interaction.depth=3,
           bag.fraction = 0.5,
           train.fraction = 0.5)

decompose_gbm_prediction(gbm1, data[1, ])

richardangell/GbmExplainR documentation built on May 22, 2019, 12:54 p.m.