funnel_measure: Caluculate difference in performance in models across...

Description Usage Arguments Value Examples

View source: R/funnel_measure.R

Description

Function funnel_measure allows users to compare two models based on their explainers. It partitions dataset on which models were builded and creates categories according to quantiles of columns in parition data. nbins parameter determinates number of qunatiles. For each category difference in provided measure is being calculated. Positive value of that differnece means that Champion model has better performance in specified category, while negative value means that one of the Challengers was better. Function allows to compare multiple Challengers at once.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
funnel_measure(
  champion,
  challengers,
  measure_function = NULL,
  nbins = 5,
  partition_data = champion$data,
  cutoff = 0.01,
  cutoff_name = "Other",
  factor_conversion_threshold = 7,
  show_info = TRUE,
  categories = NULL
)

Arguments

champion

- explainer of champion model.

challengers

- explainer of challenger model or list of explainers.

measure_function

- measure function that calculates performance of model based on true observation and prediction. Order of parameters is important and should be (y, y_hat). The measure calculated by the function should have the property that lower score value indicates better model. If NULL, RMSE will be used for regression, one minus auc for classification and crossentropy for multiclass classification.

nbins

- Number of qunatiles (partition points) for numeric columns. In case when more than one qunatile have the same value, there will be less partition points.

partition_data

- Data by which test dataset will be paritioned for computation. Can be either data.frame or character vector. When second is passed, it has to indicate names of columns that will be extracted fromm test data. By default full test data. If data.frame, number of rows has to be equal to number of rows in test data.

cutoff

- Threshold for categorical data. Entries less frequent than specified value will be merged into one category.

cutoff_name

- Name for new category that arised after merging entries less frequent than cutoff

factor_conversion_threshold

- Numeric columns with lower number of unique values than value of this parameter will be treated as factors

show_info

- Logical value indicating if progress bar should be shown.

categories

- a named list of variable names that will be plotted in a different colour. By deafault it is partitioned on Explanatory, External and Target.

Value

An object of the class funnel_measure

It is a named list containing following fields:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
library("mlr")
library("DALEXtra")
task <- mlr::makeRegrTask(
  id = "R",
  data = apartments,
  target = "m2.price"
)
learner_lm <- mlr::makeLearner(
  "regr.lm"
)
model_lm <- mlr::train(learner_lm, task)
explainer_lm <- explain_mlr(model_lm, apartmentsTest, apartmentsTest$m2.price, label = "LM")

learner_rf <- mlr::makeLearner(
  "regr.randomForest"
)
model_rf <- mlr::train(learner_rf, task)
explainer_rf <- explain_mlr(model_rf, apartmentsTest, apartmentsTest$m2.price, label = "RF")

learner_gbm <- mlr::makeLearner(
  "regr.gbm"
)
model_gbm <- mlr::train(learner_gbm, task)
explainer_gbm <- explain_mlr(model_gbm, apartmentsTest, apartmentsTest$m2.price, label = "GBM")


plot_data <- funnel_measure(explainer_lm, list(explainer_rf, explainer_gbm),
                            nbins = 5, measure_function = DALEX::loss_root_mean_square)
plot(plot_data)

DALEXtra documentation built on May 9, 2021, 9:07 a.m.