README.md

Responsible Machine Learning

With Great Power Comes Great Responsibility. Voltaire (well, maybe)

How to develop machine learning models in a responsible manner? There are several topics worth considering:

Collection of tools for Visual Exploration, Explanation and Debugging of Predictive Models

It takes a village to raise a child model.

The way how we do predictive modeling is very ineffective. We spend way too much time on manual time-consuming and easy to automate activities like data cleaning and exploration, crisp modeling, model validation. We should be focusing more on model understanding, productisation and communication.

Here are gathered tools that can be used to make out work more efficient through the whole model lifecycle. The unified grammar beyond DrWhy.AI universe is described in the Explanatory Model Analysis: Explore, Explain and Examine Predictive Models book.

Lifecycle for Predictive Models

The DrWhy is based on an unified Model Development Process inspired by RUP. Find an overview in the diagram below.

images/DALEXverse.png

The DrWhy.AI family

Packages in the DrWhy.AI family of models may be divided into four classes.

images/grammar_of_explanations.png

Here is a more detailed overview.

DALEX

CRAN_Status_Badge Build Status Coverage
StatusDrWhy-eXtrAI

The DALEX package (Descriptive mAchine Learning EXplanations) helps to understand how complex models are working. The main function explain creates a wrapper around a predictive model. Wrapped models may then be explored and compared with a collection of local and global explainers. Recent developments from the area of Interpretable Machine Learning/eXplainable Artificial Intelligence.

DALEX wraps methods from other packages, i.e. 'pdp' (Greenwell 2017) , 'ALEPlot' (Apley 2018) , 'factorMerger' (Sitko and Biecek 2017) , 'breakDown' package (Staniak and Biecek 2018) , (Fisher at al. 2018) .

Vignettes:

DALEXtra

CRAN_Status_Badge Build Status Coverage
Status DrWhy-eXtrAI

The DALEXtra package is an extension pack for DALEX package. This package provides easy to use connectors for models created with scikitlearn, keras, H2O, mljar and mlr.

Vignettes:

survex

CRAN status R-CMD-check Coverage
Status DrWhy-eXtrAI

The survex package provides model-agnostic explanations for machine learning survival models. It is based on the DALEX package.

Due to a functional type of prediction, either in the form of survival function or cumulative hazard function, standard model-agnostic explanations cannot be applied directly to survival analysis machine learning models. The survex package contains implementations of explanation methods specific to survival analysis, as well as extensions of existing ones for classification or regression.

Vignettes:

ingredients

CRAN_Status_Badge Build Status Coverage
Status DrWhy-eXtrAI

The ingredients package is a collection of tools for assessment of feature importance and feature effects.

Key functions: feature_importance() for assessment of global level feature importance, ceteris_paribus() for calculation of the Ceteris Paribus / What-If Profiles, partial_dependency() for Partial Dependency Plots, conditional_dependency() for Conditional Dependency Plots also called M Plots, accumulated_dependency() for Accumulated Local Effects Plots, cluster_profiles() for aggregation of Ceteris Paribus Profiles, generic print() and plot() for better usability of selected explainers, generic plotD3() for interactive D3 based explanations, and generic describe() for explanations in natural language.

Vignettes:

iBreakDown

CRAN_Status_Badge Build Status Coverage
Status DrWhy-eXtrAI

The iBreakDown package is a model agnostic tool for explanation of predictions from black boxes ML models. Break Down Table shows contributions of every variable to a final prediction. Break Down Plot presents variable contributions in a concise graphical way. SHAP (Shapley Additive Attributions) values are calculated as average from random Break Down profiles. This package works for binary classifiers as well as regression models.

iBreakDown is a successor of the breakDown package. It is faster (complexity O(p) instead of O(p^2)). It supports interactions and interactive explainers with D3.js plots.

Vignettes:

auditor

CRAN_Status_Badge Build Status Coverage
Status DrWhy-eXtrAI

The auditor package is a tool for model-agnostic validation. Implemented techniques facilitate assessing and comparing the goodness of fit and performance of models. In addition, they may be used for the analysis of the similarity of residuals and for the identification of outliers and influential observations. The examination is carried out by diagnostic scores and visual verification. Due to the flexible and consistent grammar, it is simple to validate models of any classes.

Learn more:

fairmodels

CRAN_Status_Badge R build status Codecov test coverage DrWhy-eXtrAI

Flexible tool for bias detection, visualization, and mitigation. Use models explained with DALEX and calculate fairness classification metrics based on confusion matrices using fairness_check() or try newly developed module for regression models using fairness_check_regression(). R package fairmodels allows to compare and gain information about various machine learning models. Mitigate bias with various pre-processing and post-processing techniques. Make sure your models are classifying protected groups similarly.

Learn more:

vivo

CRAN_Status_Badge Build Status Coverage
Status DrWhy-eXtrAI

The vivo package helps to calculate instance level variable importance (measure of local sensitivity). The importance measure is based on Ceteris Paribus profiles and can be calculated in eight variants. Select the variant that suits your needs by setting parameters: absolute_deviation, point and density.

Learn more:

randomForestExplainer

CRAN_Status_Badge Build Status Coverage
Status DrWhy-eXtrAI

The randomForestExplainer package helps to understand what is happening inside a Random Forest model. This package helps to explore main effects and pairwise interactions, depth distribution, conditional responses and feature importance.

Learn more:

xspliner

CRAN_Status_Badge Build Status Coverage
Status DrWhy-eXtrAI

The xspliner package is a collection of tools for training interpretable surrogate ML models. The package helps to build simple, interpretable models that inherits informations provided by more complicated ones - resulting model may be treated as explanation of provided black box, that was supplied prior to the algorithm. Provided functionality offers graphical and statistical evaluation both for overall model and its components.

shapper

CRAN_Status_Badge Build Status Coverage
Status DrWhy-eXtrAI

The shapper is an R wrapper of SHAP python library. It accesses python implementation through reticulate connector.

drifter

CRAN_Status_Badge Build Status Coverage
StatusDrWhy-eXtrAI

The drifter is an R package that identifies concept drift in model structure or in data structure.

Machine learning models are often fitted and validated on historical data under an assumption that data are stationary. The most popular techniques for validation (k-fold cross-validation, repeated cross-validation, and so on) test models on data with the same distribution as training data.

Yet, in many practical applications, deployed models are working in a changing environment. After some time, due to changes in the environment, model performance may degenerate, as model may be less reliable.

Concept drift refers to the change in the data distribution or in the relationships between variables over time. Think about model for energy consumption for a school, over time the school may be equipped with larger number of devices of with more power-efficient devices that may affect the model performance.

EIX

CRAN_Status_Badge Build Status Coverage
Status DrWhy-eXtrAI

The EIX package implements set of techniques to explore and explain XGBoost and LightGBM models. Main functions of this package cover various variable importance measures and well as functions for identification of interactions between variables.

Learn more:

modelStudio

CRAN_Status_Badge Build Status Coverage
Status DrWhy-eXtrAI

The modelStudio package automates the explanatory analysis of machine learning predictive models. It generates advanced interactive model explanations in the form of a serverless HTML site with only one line of code. This tool is model-agnostic, therefore compatible with most of the black-box predictive models and frameworks (e.g. mlr/mlr3, xgboost, caret, h2o, parsnip, tidymodels, scikit-learn, lightgbm, keras/tensorflow).

The main modelStudio() function computes various (instance and model-level) explanations and produces a customisable dashboard, which consists of multiple panels for plots with their short descriptions. Easily save  the dashboard and share it with others. Tools for Explanatory Model Analysis unite with tools for Exploratory Data Analysis to give a broad overview of the model behavior.

Learn more:

arenar

CRAN_Status_Badge R build status Codecov test coverage DrWhy-eXtrAI

Arena is an interactive tool that allows you to explore and compare any model regardless of its internal structure.

The arenar package can be run in two modes - live (R runs in the background and calculates all necessary explanations) and serverless (all necessary explanations are calculated earlier).

Using the Arena is trivially simple. Examples with different levels of advancement are available:

modelDown

CRAN_Status_Badge Build Status Coverage
Status DrWhy-eXtrAI

ThemodelDown package generates a website with HTML summaries for predictive models. Is uses DALEX explainers to compute and plot summaries of how given models behave. We can see how well models behave (Model Performance, Auditor), how much each variable contributes to predictions (Variable Response) and which variables are the most important for a given model (Variable Importance). We can also compare Concept Drift for pairs of models (Drifter). Additionally, data available on the website can be easily recreated in current R session (using archivist package).

Learn more:

rSAFE

CRAN_Status_Badge Build Status Coverage
StatusDrWhy-eXtrAI

The rSAFE package is a model agnostic tool for making an interpretable white-box model more accurate using alternative black-box model called surrogate model. Based on the complicated model, such as neural network or random forest, new features are being extracted and then used in the process of fitting a simpler interpretable model, improving its overall performance.

Learn more:

EloML

CRAN_Status_Badge Build Status Coverage
Status DrWhy-AutoMat

The EloML package provides Elo rating system for machine learning models. Elo Predictive Power (EPP) score helps to assess model performance based Elo ranking system.

Learn more:

archivist

CRAN_Status_Badge Build Status Coverage
Status DrWhy-eXtrAI

The archivist package automate serialization and deserialization of R objects. Objects are stored with additional metadata to facilitate reproducibility and governance of data science projects.

Everything that exists in R is an object. archivist is an R package that stores copies of all objects along with their metadata. It helps to manage and recreate objects with final or partial results from data analysis. Use archivist to record every result, to share these results with future you or with others, to search through repository of objects created in the past but needed now.

Learn more:

Tools that are useful during the model lifetime. MI2 stands for our internal tools.

1. Data preparation

2. Data understanding

4. Model assembly

5. Model audit

6. Model delivery

Family of Model Explainers

images/DrWhyAI.png

Architecture of DrWhy

DrWhy works on fully trained predictive models. Models can be created with any tool.

DrWhy uses DALEX package to wrap model with additional metadata required for explanations, like validation data, predict function etc.

Explainers for predictive models can be created with model agnostic or model specific functions implemented in various packages.

Hype_Cycle



DrWhy2/DrWhy documentation built on Feb. 23, 2023, 2:57 p.m.