Below we describe how to reproduce the code and figure examples in the paper "shapr: Explaining Machine Learning Models with Conditional Shapley Values in R and Python"
The instructions assume a Linux-like environment and that the commands are executed from the directory containing this file. Execution from other locations or on different operating systems requires adjustments.
The code has been tested on Ubuntu 20.04.6 LTS with R 4.4.1 and Python 3.12.7 installed.
The folder contains the following files:
README-reproduce.md
: This file.code_R.R
: The R code used to generate the results and figures in the paper.code_py.py
: The Python code used to generate the results and figures in the paper.R_prep_data_and_model.R
: An R script that generates the data and models used in the examples, and save them in the data_and_models
directory (i.e. it is not necessary to run this script).code_py_to_html.sh
: A bash script that converts the Python code to HTML using jupytext
and nbconvert
.data_and_models/
: A folder containing the data and models used in the examples, as generated by R_prep_data_and_model.R
.code_R.html
: The HTML file generated by code_R.R
, containing the R code, its output and basic session information.code_py.html
: The HTML file generated by code_py_to_html.sh
bash script, containing the Python code, its output and basic session information.paper_figures/
: A folder containing the figures for the examples used in the paper, generated by code_R.R
.html_figures/
: A folder containing the figures for the examples used in the HTML file, generated by code_R.R
.To reproduce the R examples and figures, make sure you have installed the shapr
package, its required packages, in addition to the following packages from CRAN: xgboost
, ctree
, future
, progressr
and patchwork
.
Then, from the command line, run
Rscript -e "knitr::spin('code_R.R')"
This will generate the file code_R.html
containing the code from code_R.R
accompanied with its output, as well as the figures in the paper_figures
and html_figures
folders.
Note 1: The html file displays the code and output of the code displayed in the paper. Additional code used to mildly customize and save the figures is provided in the code_R.R
file and executed by knitr::spin()
, but not shown in the html-file.
Note 2: The R_prep_data_and_model.R
script generates the data and models files used by code_R.R
. This is already done and the files are included in the data_and_models
folder to ensure reproducibility across a broader range of environments. I.e., it is not necessary to run this script, but it is included for complete reproducibility.
To reproduce the Python examples, make sure you installed the shaprpy
Python library and its required packages (in addition to the shapr
R package).
To simplify the reproducability, we have created a simple bash script executing the Python code in a manner similar to how the knitr::spin()
function operates for the R
code.
The bash script requires the jupytext
nbconvert
and session_info
libraries to run.
They can installed with pip
as follows:
pip install jupytext nbconvert session_info
Then, from the command line, run
bash code_py_to_html.sh
This will generate the file code_py.html
containing the code from code_py.py
accompanied with it's output and basic session information.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.