explain_xgboost: Create explainer from your xgboost model

Description Usage Arguments Value Examples

View source: R/explain_xgboost.R

Description

DALEX is designed to work with various black-box models like tree ensembles, linear models, neural networks etc. Unfortunately R packages that create such models are very inconsistent. Different tools use different interfaces to train, validate and use models. One of those tools, we would like to make more accessible is xgboost.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
explain_xgboost(
  model,
  data = NULL,
  y = NULL,
  weights = NULL,
  predict_function = NULL,
  predict_function_target_column = NULL,
  residual_function = NULL,
  ...,
  label = NULL,
  verbose = TRUE,
  precalculate = TRUE,
  colorize = TRUE,
  model_info = NULL,
  type = NULL,
  encode_function = NULL,
  true_labels = NULL
)

Arguments

model

object - a model to be explained

data

data.frame or matrix - data that was used for fitting. If not provided then will be extracted from the model. Data should be passed without target column (this shall be provided as the y argument). NOTE: If target variable is present in the data, some of the functionalities my not work properly.

y

numeric vector with outputs / scores. If provided then it shall have the same size as data. For classif task has to be numerci in range [0, nclasses)

weights

numeric vector with sampling weights. By default it's NULL. If provided then it shall have the same length as data

predict_function

function that takes two arguments: model and new data and returns numeric vector with predictions

predict_function_target_column

Character or numeric containing either column name or column number in the model prediction object of the class that should be considered as positive (ie. the class that is associated with probability 1). If NULL, the second column of the output will be taken for binary classification. For a multiclass classification setting that parameter cause switch to binary classification mode with 1 vs others probabilities.

residual_function

function that takes three arguments: model, data and response vector y. It should return a numeric vector with model residuals for given data. If not provided, response residuals (y-\hat{y}) are calculated.

...

other parameters

label

character - the name of the model. By default it's extracted from the 'class' attribute of the model

verbose

if TRUE (default) then diagnostic messages will be printed

precalculate

if TRUE (default) then 'predicted_values' and 'residuals' are calculated when explainer is created.

colorize

if TRUE (default) then WARNINGS, ERRORS and NOTES are colorized. Will work only in the R console.

model_info

a named list (package, version, type) containg information about model. If NULL, DALEX will seek for information on it's own.

type

type of a model, either classification or regression. If not specified then type will be extracted from model_info.

encode_function

fuction(data, ...) that if executed with data parameters returns encoded dataframe that was used to fit model. Xgboost does not handle factors on it's own so such function is needed to aquire better explanations.

true_labels

vecotr of y before encoding.

Value

explainer object (explain) ready to work with DALEX

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
library("xgboost")
library("DALEXtra")
library("mlr")
# 8th column is target that has to be omitted in X data
data <- as.matrix(createDummyFeatures(titanic_imputed[,-8]))
model <- xgboost(data, titanic_imputed$survived, nrounds = 10,
                 params = list(objective = "binary:logistic"),
                prediction = TRUE)
# explainer with encode functiom
explainer_1 <- explain_xgboost(model, data = titanic_imputed[,-8],
                               titanic_imputed$survived,
                               encode_function = function(data) {
 as.matrix(createDummyFeatures(data))
})
plot(predict_parts(explainer_1, titanic_imputed[1,-8]))

# explainer without encode function
explainer_2 <- explain_xgboost(model, data = data, titanic_imputed$survived)
plot(predict_parts(explainer_2, data[1,,drop = FALSE]))

DALEXtra documentation built on May 9, 2021, 9:07 a.m.