Basic Workflow"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  message = FALSE,
  fig.width = 5,
  fig.height = 4
)

Introduction

This vignette shows the basic workflow of using SHAPforxgboost for interpretation of models trained with XGBoost, a hightly efficient gradient boosting implementation [@chen2016].

library("ggplot2")
library("SHAPforxgboost")
library("xgboost")

set.seed(9375)

Training the model

Let's train a small model to predict the first column in the iris data set, namely Sepal.Length.

head(iris)

X <- data.matrix(iris[, -1])
dtrain <- xgb.DMatrix(X, label = iris[[1]])

fit <- xgb.train(
  params = list(
    objective = "reg:squarederror",
    learning_rate = 0.1
  ), 
  data = dtrain,
  nrounds = 50
)

SHAP analysis

Now, we can prepare the SHAP values and analyze the results. All this in just very few lines of code!

# Crunch SHAP values
shap <- shap.prep(fit, X_train = X)

# SHAP importance plot
shap.plot.summary(shap)

# Alternatively, mean absolute SHAP values
shap.plot.summary(shap, kind = "bar")

# Dependence plots in decreasing order of importance
# (colored by strongest interacting variable)
for (x in shap.importance(shap, names_only = TRUE)) {
  p <- shap.plot.dependence(
    shap, 
    x = x, 
    color_feature = "auto", 
    smooth = FALSE, 
    jitter_width = 0.01, 
    alpha = 0.4
    ) +
  ggtitle(x)
  print(p)
}

Note: print is required only in the context of using ggplot in rmarkdown and for loop.

This is just a teaser: SHAPforxgboost can do much more! Check out the README for much more information.

References



Try the SHAPforxgboost package in your browser

Any scripts or data that you put into this service are public.

SHAPforxgboost documentation built on May 31, 2023, 8:20 p.m.