modelStudio: Interactive Studio for Explanatory Model Analysis

Description Usage Arguments Value References See Also Examples

View source: R/modelStudio.R

Description

This function computes various (instance and dataset level) model explanations and produces a customisable dashboard, which consists of multiple panels for plots with their short descriptions. Easily save the dashboard and share it with others. Tools for Explanatory Model Analysis unite with tools for Exploratory Data Analysis to give a broad overview of the model behavior.

The extensive documentation covers:

Displayed variable can be changed by clicking on the bars of plots or with the first dropdown list, and observation can be changed with the second dropdown list. The dashboard gathers useful, but not sensitive, information about how it is being used (e.g. computation length, package version, dashboard dimensions). This is for the development purposes only and can be blocked by setting telemetry to FALSE.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
modelStudio(explainer, ...)

## S3 method for class 'explainer'
modelStudio(
  explainer,
  new_observation = NULL,
  new_observation_y = NULL,
  new_observation_n = 3,
  facet_dim = c(2, 2),
  time = 500,
  max_features = 10,
  N = 300,
  N_fi = N * 10,
  B = 10,
  B_fi = B,
  eda = TRUE,
  show_info = TRUE,
  parallel = FALSE,
  options = ms_options(),
  viewer = "external",
  widget_id = NULL,
  license = NULL,
  telemetry = TRUE,
  max_vars = NULL,
  ...
)

Arguments

explainer

An explainer created with DALEX::explain().

...

Other parameters.

new_observation

New observations with columns that correspond to variables used in the model.

new_observation_y

True label for new_observation (optional).

new_observation_n

Number of observations to be taken from the explainer$data if new_observation = NULL. See vignette

facet_dim

Dimensions of the grid. Default is c(2,2).

time

Time in ms. Set the animation length. Default is 500.

max_features

Maximum number of features to be included in BD and SV plots. Default is 10.

N

Number of observations used for the calculation of PD and AD. Default is 300. See vignette

N_fi

Number of observations used for the calculation of FI. Default is 10*N.

B

Number of permutation rounds used for calculation of SV. Default is 10. See vignette

B_fi

Number of permutation rounds used for calculation of FI. Default is B.

eda

Compute EDA plots and Residuals vs Feature plot, which adds the data to the dashboard. Default is TRUE.

show_info

Verbose a progress on the console. Default is TRUE.

parallel

Speed up the computation using parallelMap::parallelMap(). See vignette. This might interfere with showing progress using show_info.

options

Customize modelStudio. See ms_options and vignette.

viewer

Default is external to display in an external RStudio window. Use browser to display in an external browser or internal to use the RStudio internal viewer pane for output.

widget_id

Use an explicit element ID for the widget (rather than an automatically generated one). Useful e.g. when using modelStudio with Shiny. See vignette.

license

Path to the file containing the license (con parameter passed to readLines()). It can be used e.g. to include the license for explainer$data as a comment in the source of .html output file.

telemetry

The dashboard gathers useful, but not sensitive, information about how it is being used (e.g. computation length, package version, dashboard dimensions). This is for the development purposes only and can be blocked by setting telemetry to FALSE.

max_vars

An alias for max_features. If provided, it will override the value.

Value

An object of the r2d3, htmlwidget, modelStudio class.

References

See Also

Vignettes: modelStudio - R & Python examples and modelStudio - perks and features

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
library("DALEX")
library("modelStudio")

#:# ex1 classification on 'titanic' data

# fit a model
model_titanic <- glm(survived ~., data = titanic_imputed, family = "binomial")

# create an explainer for the model
explainer_titanic <- explain(model_titanic,
                             data = titanic_imputed,
                             y = titanic_imputed$survived,
                             label = "Titanic GLM")

# pick observations
new_observations <- titanic_imputed[1:2,]
rownames(new_observations) <- c("Lucas","James")

# make a studio for the model
modelStudio(explainer_titanic,
            new_observations,
            N = 200,  B = 5) # faster example



#:# ex2 regression on 'apartments' data
if (requireNamespace("ranger", quietly=TRUE)) {
  library("ranger")
  model_apartments <- ranger(m2.price ~. ,data = apartments)

  explainer_apartments <- explain(model_apartments,
                                  data = apartments,
                                  y = apartments$m2.price)

  new_apartments <- apartments[1:2,]
  rownames(new_apartments) <- c("ap1","ap2")

  # change dashboard dimensions and animation length
  modelStudio(explainer_apartments,
              new_apartments,
              facet_dim = c(2, 3),
              time = 800)

  # add information about true labels
  modelStudio(explainer_apartments,
              new_apartments,
              new_observation_y = new_apartments$m2.price)

  # don't compute EDA plots
  modelStudio(explainer_apartments,
              eda = FALSE)
}

#:# ex3 xgboost model on 'HR' dataset
if (requireNamespace("xgboost", quietly=TRUE)) {
  library("xgboost")
  HR_matrix <- model.matrix(status == "fired" ~ . -1, HR)

  # fit a model
  xgb_matrix <- xgb.DMatrix(HR_matrix, label = HR$status == "fired")
  params <- list(max_depth = 3, objective = "binary:logistic", eval_metric = "auc")
  model_HR <- xgb.train(params, xgb_matrix, nrounds = 300)

  # create an explainer for the model
  explainer_HR <- explain(model_HR,
                          data = HR_matrix,
                          y = HR$status == "fired",
                          label = "xgboost")

  # pick observations
  new_observation <- HR_matrix[1:2, , drop=FALSE]
  rownames(new_observation) <- c("id1", "id2")

  # make a studio for the model
  modelStudio(explainer_HR,
              new_observation)
}

modelStudio documentation built on Jan. 13, 2021, 8:12 p.m.