Nothing
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Here we will use the HR churn data (https://www.kaggle.com/) to present the breakDown
package for ranger
models.
The data is in the breakDown
package
library(breakDown) head(HR_data, 3)
Now let's create a ranger
classification forest for churn, the left
variable.
library(ranger) HR_data$left <- factor(HR_data$left) model <- ranger(left ~ ., data = HR_data, importance = 'impurity', probability=TRUE, min.node.size = 2000) predict.function <- function(model, new_observation) predict(model, new_observation, type = "response")$predictions[,2] predict.function(model, HR_data[11,])
But how to understand which factors drive predictions for a single observation?
With the breakDown
package!
Explanations for the trees votings.
library(ggplot2) explain_1 <- broken(model, HR_data[11,-7], data = HR_data[,-7], predict.function = predict.function, direction = "down") explain_1 plot(explain_1) + ggtitle("breakDown plot (direction=down) for ranger model") explain_2 <- broken(model, HR_data[11,-7], data = HR_data[,-7], predict.function = predict.function, direction = "up") explain_2 plot(explain_2) + ggtitle("breakDown plot (direction=up) for ranger model")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.