Description Usage Arguments Details Value Examples
View source: R/decompose_gbm_prediction.R
For a single observation decompose the prediction for a gbm into feature
contributions + bias. Within a single tree, the contribution for a given
node is calculated by subtracting the prediction for the current node from
the prediction of the next node the observation would visit in the tree.
The predicted value for the first node in the tree is combined into the
bias term (which also includes the intercept or initF
from the model).
Node contributions are summed by the split variable for the node, across all
trees in the model, giving the observation's prediction represented as
bias + contribution for each feature used in the model.
1 2 | decompose_gbm_prediction(gbm, prediction_row, type = "link",
verbose = FALSE, aggregate_contributions = TRUE, n_trees = NULL)
|
gbm |
|
prediction_row |
single row |
type |
either "link" or "response". Default is "link". If "response" and the gbm distribution is "poisson" then contributions are converted to be on the response scale (i.e. counts). For all distributions except "poisson" both options do the same. |
verbose |
should split decisions be printed to console? Default value
is |
aggregate_contributions |
should feature contributions aggregated to
variable level be returned? Default is |
n_trees |
the number of trees to use in generating the prediction for
the given row. Default |
Based on treeinterpreter Python package for random forests; https://github.com/andosa/treeinterpreter.
data.frame
containing variable contributions to predicted
value.
If aggregate_contributions
= TRUE
, the contributions are
at the variable level;
|
variable name |
|
variable contribution to prediction |
|
value of variable for input row |
|
class of variable |
If aggregate_contributions
= FALSE
, the contributions are
at node x tree level, see output from get_decision_path
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | N <- 1000
X1 <- runif(N)
X2 <- 2*runif(N)
X3 <- ordered(sample(letters[1:4],N,replace=TRUE),levels=letters[4:1])
X4 <- factor(sample(letters[1:6],N,replace=TRUE))
X5 <- factor(sample(letters[1:3],N,replace=TRUE))
X6 <- 3*runif(N)
mu <- c(-1,0,1,2)[as.numeric(X3)]
SNR <- 10 # signal-to-noise ratio
Y <- X1**1.5 + 2 * (X2**.5) + mu
sigma <- sqrt(var(Y)/SNR)
Y <- Y + rnorm(N,0,sigma)
# introduce some missing values
X1[sample(1:N,size=500)] <- NA
X4[sample(1:N,size=300)] <- NA
data <- data.frame(Y=Y,X1=X1,X2=X2,X3=X3,X4=X4,X5=X5,X6=X6)
# fit initial model
gbm1 <- gbm(Y~X1+X2+X3+X4+X5+X6,
data=data,
var.monotone=c(0,0,0,0,0,0),
distribution="gaussian",
n.trees=1000,
shrinkage=0.05,
interaction.depth=3,
bag.fraction = 0.5,
train.fraction = 0.5)
decompose_gbm_prediction(gbm1, data[1, ])
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.