explain_scikitlearn: Wrapper for Python Scikit-Learn Models

Description Usage Arguments Value Author(s) Examples

View source: R/explain_scikitlearn.R

Description

scikit-learn models may be loaded into R environment like any other Python object. This function helps to inspect performance of Python model and compare it with other models, using R tools like DALEX. This function creates an object that is easily accessible R version of scikit-learn model exported from Python via pickle file.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
explain_scikitlearn(
  path,
  yml = NULL,
  condaenv = NULL,
  env = NULL,
  data = NULL,
  y = NULL,
  weights = NULL,
  predict_function = NULL,
  residual_function = NULL,
  ...,
  label = NULL,
  verbose = TRUE,
  precalculate = TRUE,
  colorize = TRUE,
  model_info = NULL
)

Arguments

path

a path to the pickle file. Can be used without other arguments if you are sure that active Python version match pickle version.

yml

a path to the yml file. Conda virtual env will be recreated from this file. If OS is Windows conda has to be added to the PATH first

condaenv

If yml param is provided, a path to the main conda folder. If yml is null, a name of existing conda environment.

env

A path to python virtual environment.

data

test data set that will be passed to explain.

y

vector that will be passed to explain.

weights

numeric vector with sampling weights. By default it's NULL. If provided then it shall have the same length as data

predict_function

predict function that will be passed into explain. If NULL, default will be used.

residual_function

residual function that will be passed into explain. If NULL, default will be used.

...

other parameters

label

label that will be passed into explain. If NULL, default will be used.

verbose

bool that will be passed into explain. If NULL, default will be used.

precalculate

if TRUE (default) then 'predicted_values' and 'residuals' are calculated when explainer is created. This will happenn also if 'verbose' is TRUE.

colorize

if TRUE (default) then WARNINGS, ERRORS and NOTES are colorized. Will work only in the R console.

model_info

a named list (package, version, type) containg information about model. If NULL, DALEX will seek for information on it's own.

Value

An object of the class 'explainer'. It has additional field param_set when user can check parameters of scikitlearn model

Example of Python code

from pandas import DataFrame, read_csv
import pandas as pd
import pickle
import sklearn.ensemble
model = sklearn.ensemble.GradientBoostingClassifier()
model = model.fit(titanic_train_X, titanic_train_Y)
pickle.dump(model, open("gbm.pkl", "wb"), protocol = 2)


In order to export environment into .yml, activating virtual env via activate name_of_the_env and execution of the following shell command is necessary
conda env export > environment.yml

Errors use case
Here is shortened version of solution for specific errors

There already exists environment with a name specified by given .yml file
If you provide .yml file that in its header contatins name exact to name of environment that already exists, existing will be set active without changing it.
You have two ways of solving that issue. Both connected with anaconda prompt. First is removing conda env with command:
conda env remove --name myenv
And execute function once again. Second is updating env via:
conda env create -f environment.yml

Conda cannot find specified packages at channels you have provided.
That error may be casued by a lot of things. One of those is that specified version is too old to be avaialble from offcial conda repo. Edit Your .yml file and add link to proper repository at channels section.

Issue may be also connected with the platform. If model was created on the platform with different OS yo may need to remove specific version from .yml file.
- numpy=1.16.4=py36h19fb1c0_0
- numpy-base=1.16.4=py36hc3f5095_0
In the example above You have to remove =py36h19fb1c0_0 and =py36hc3f5095_0
If some packages are not availbe for anaconda at all, use pip statement

If .yml file seems not to work, virtual env can be created manually using anaconda promt.
conda create -n name_of_env python=3.4
conda install -n name_of_env name_of_package=0.20

Author(s)

Szymon Maksymiuk

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
library("DALEXtra")
if(DALEXtra:::is_conda()) {
   # Explainer build (Keep in mind that 18th column is target)
   titanic_test <- read.csv(system.file("extdata", "titanic_test.csv", package = "DALEXtra"))
   # Keep in mind that when pickle is being built and loaded,
   # not only Python version but libraries versions has to match aswell
   explainer <- explain_scikitlearn(system.file("extdata", "scikitlearn.pkl", package = "DALEXtra"),
   yml = system.file("extdata", "testing_environment.yml", package = "DALEXtra"),
   data = titanic_test[,1:17], y = titanic_test$survived)
   plot(model_performance(explainer))

   # Predictions with newdata
   predict(explainer, titanic_test[1:10,1:17])

} else {
  print('Conda is required.')
}

ModelOriented/DALEXtra documentation built on Nov. 22, 2019, 1:08 p.m.