knitr::opts_chunk$set( collapse = FALSE, comment = "#>", warning = FALSE, message = FALSE )
The modelStudio()
function computes various (instance and dataset level) model explanations
and produces a customisable dashboard, which consists of multiple panels for plots with their
short descriptions. Easily save the dashboard and share it with others. Tools for
Explanatory Model Analysis unite with tools for Exploratory Data Analysis
to give a broad overview of the model behavior.
Let's use HR
dataset to explore modelStudio
parameters:
train <- DALEX::HR train$fired <- as.factor(ifelse(train$status == "fired", 1, 0)) train$status <- NULL head(train)
knitr::kable(head(train), digits = 2, caption = "DALEX::HR dataset")
Prepare HR_test
data and a ranger
model for the explainer:
# fit a ranger model library("ranger") model <- ranger(fired ~., data = train, probability = TRUE) # prepare validation dataset test <- DALEX::HR_test[1:1000,] test$fired <- ifelse(test$status == "fired", 1, 0) test$status <- NULL # create an explainer for the model explainer <- DALEX::explain(model, data = test, y = test$fired) # start modelStudio library("modelStudio")
Pass data points to the new_observation
parameter for instance explanations
such as Break Down,
Shapley Values and
Ceteris Paribus Profiles.
Use new_observation_y
to show their true labels.
new_observation <- test[1:3,] rownames(new_observation) <- c("John Snow", "Arya Stark", "Samwell Tarly") true_labels <- test[1:3,]$fired modelStudio(explainer, new_observation = new_observation, new_observation_y = true_labels)
If new_observation = NULL
, then choose new_observation_n
observations, evenly spread by the order of y_hat
. This shall always include the observations, which ids are which.min(y_hat)
and which.max(y_hat)
.
modelStudio(explainer, new_observation_n = 5) # default is 3
Achieve bigger or smaller modelStudio
grid with facet_dim
parameter.
# small dashboard with 2 panels modelStudio(explainer, facet_dim = c(1,2)) # large dashboard with 9 panels modelStudio(explainer, facet_dim = c(3,3))
Manipulate time
parameter to set animation length. Value 0 will make
them invisible.
# slow down animations modelStudio(explainer, time = 1000) # turn off animations modelStudio(explainer, time = 0)
N
is a number of observations used for calculation of
Partial Dependence
and Accumulated Dependence Profiles (default is 300
). N_fi
is a number of observations used for calculation of
Feature Importance (default is N*10
).N_sv
is a number of observations used for calculation of
Shapley Values (default is N*3
).B
is a number of permutation rounds used for calculation of
Shapley Values (default is 10
).B_fi
is a number of permutation rounds used for calculation of
Feature Importance (default is B
).Decrease N
and B
parameters to lower the computation time or increase
them to get more accurate empirical results.
# faster, less precise modelStudio(explainer, N = 200, B = 5) # slower, more precise modelStudio(explainer, N = 500, B = 15)
Don't compute the EDA plots if they are not needed. Set the eda
parameter to FALSE
.
modelStudio(explainer, eda = FALSE)
Hide computation progress bar messages with show_info
parameter.
modelStudio(explainer, show_info = FALSE)
Change viewer
parameter to set where to display modelStudio
.
Best described in r2d3
documentation.
modelStudio(explainer, viewer = "browser")
Speed up modelStudio
computation by setting parallel
parameter to TRUE
.
It uses parallelMap
package
to calculate local explainers faster. It is really useful when using modelStudio
with
complicated models, vast datasets or many observations are being processed.
All options can be set outside of the function call. How to use parallelMap.
# set up the cluster options( parallelMap.default.mode = "socket", parallelMap.default.cpus = 4, parallelMap.default.show.info = FALSE ) # calculations of local explanations will be distributed into 4 cores modelStudio(explainer, new_observation = test[1:16,], parallel = TRUE)
Customize some of the modelStudio
looks by overwriting default options returned
by the ms_options()
function.
Full list of options.
# set additional graphical parameters new_options <- ms_options( show_subtitle = TRUE, bd_subtitle = "Hello World", line_size = 5, point_size = 9, line_color = "pink", point_color = "purple", bd_positive_color = "yellow", bd_negative_color = "orange" ) modelStudio(explainer, options = new_options)
All visual options can be changed after the calculations using ms_update_options()
.
old_ms <- modelStudio(explainer) old_ms # update the options new_ms <- ms_update_options(old_ms, time = 0, facet_dim = c(1,2), margin_left = 150) new_ms
Use ms_update_observations()
to add more observations with their local explanations to the modelStudio
.
old_ms <- modelStudio(explainer) old_ms # add new observations plus_ms <- ms_update_observations(old_ms, explainer, new_observation = test[101:102,]) plus_ms # overwrite old observations new_ms <- ms_update_observations(old_ms, explainer, new_observation = test[103:104,], overwrite = TRUE) new_ms
Use the widget_id
argument and r2d3
package to render the modelStudio
output in Shiny.
See Using r2d3 with Shiny and consider
the following example:
library(shiny) library(r2d3) ui <- fluidPage( textInput("text", h3("Text input"), value = "Enter text..."), uiOutput('dashboard') ) server <- function(input, output) { #:# id of div where modelStudio will appear WIDGET_ID = 'MODELSTUDIO' #:# create modelStudio library(modelStudio) library(DALEX) model <- glm(survived ~., data = titanic_imputed, family = "binomial") explainer <- DALEX::explain(model, data = titanic_imputed, y = titanic_imputed$survived, label = "Titanic GLM", verbose = FALSE) ms <- modelStudio(explainer, widget_id = WIDGET_ID, #:# use the widget_id show_info = FALSE) ms$elementId <- NULL #:# remove elementId to stop the warning #:# basic render d3 output output[[WIDGET_ID]] <- renderD3({ ms }) #:# use render ui to set proper width and height output$dashboard <- renderUI({ d3Output(WIDGET_ID, width=ms$width, height=ms$height) }) } shinyApp(ui = ui, server = server)
Use explain_*()
functions from the DALEXtra
package to explain various models.
Bellow basic example of making modelStudio
for a mlr
model using explain_mlr()
.
library(DALEXtra) library(mlr) # fit a model task <- makeClassifTask(id = "task", data = train, target = "fired") learner <- makeLearner("classif.ranger", predict.type = "prob") model <- train(learner, task) # create an explainer for the model explainer_mlr <- explain_mlr(model, data = test, y = test$fired, label = "mlr") # make a studio for the model modelStudio(explainer_mlr)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.