library(ggplot2)
#library(shapleyr)
#load the packages that we need
devtools::load_all()
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(collapse = T, comment = "#>")

Intro to Shapley value

The Shapley value is a method that gives a solution to the following problem: A coalition of players play a game, for which they get a payout, but it is not clear how to distribute the payout fairly among the players. The Shapley value solves this problem by trying out different coalitions of players to decide how much each player changes the amount of the payout. What does this have to do with machine learning? In machine learning, the features (=players) work together to get the payout (=predicted value). The Shapley value tells us, how much each feature contributed to the prediction. Properties: 1. * Pareto-efficiency. 1. * Symmetry. * Additivity. * Zero player.

Shapley function

The Shapley function gives you as output many informations like task type, feature names, predict type, prediction response, mean of data and the shapley values for every feature. You get the information by using the get functions, which are described below (Getters). row.nr = 3 is just an example. You can also choose a range like row.nr = 1:5

Example 1:

Regression task

#   task = bh.task,      predict.methods = regr.lm
s1 <-shapley(row.nr = 3, task = bh.task, model = train(makeLearner("regr.lm"), bh.task))
# Here you can see the whole output:
s1
# Here we use getShapleyValues(s1) to show only the shapley values. In the following examples we show only the shapley values. 
knitr::kable(getShapleyValues(s1))

Example 2:

Classification task

#  task = iris.task,      predict.methods = classif.lda
s2 <-shapley(row.nr = 3, task = iris.task, model = train(makeLearner("classif.lda"), iris.task))
# shapley Value
knitr::kable(getShapleyValues(s2))

Example 3:

Multilabel task

#  task = yeast.task,      predict.methods = multilabel.rFerns
s3 <-shapley(row.nr = 3, task = yeast.task, model = train(makeLearner("multilabel.rFerns"), yeast.task))
# shapley Value
knitr::kable(getShapleyValues(s3))

Example 4:

Cluster task

#  task = mtcars.task,      predict.methods = cluster.kmeans
s4 <-shapley(row.nr = 5, task = mtcars.task, model = train(makeLearner("cluster.kmeans"), mtcars.task))
# shapley Value
knitr::kable(getShapleyValues(s4))

Example 5:

Calculate exact shapley value

Unsampled version created for calculating the exact shapley value. Every permutation needs to be calculated. If you have 5 features/ players, it takes 2^5=32 permutations. In the test.data you can see that every coalation (represented with True/False, respectively 1/0) must have a target-value for the coalation. With the parameter target you choose the column of the result of the coalationfunction.

test.data = as.data.frame(rbind(c(1,0,0,12),
  c(0,1,0,6),
  c(0,0,1,9),
  c(1,1,0,24),
  c(1,0,1,27),
  c(0,1,1,15),
  c(0,0,0,0),
  c(1,1,1,36)))
names(test.data) = c("Ha", "He", "Da", "value")
s5 <-shapley.unsampled(data.input = test.data, target = "value")
s5

An example how to calculate the Shapley value of feature b:

Permutationen Beitrag vor b vor und mit b Marginaler Beitrag ------------- -------------- -------------- ------------------- a,b,c v({a}) = 12 v({a,b})=24 12 = 24-12 a,c,b v({a,c}) = 27 v({a,b,c}) = 36 9 = 36-27 b,a,c v({}) = 0 v({b}) = 6 6 b,c,a v({}) = 0 v({b}) = 6 6 c,a,b v({a,c}) = 27 v({a,b,c}) = 36 9 c,b,a v({c}) = 9 v({b,c}) = 15 6 ------------- -------------- -------------- -------------------- Result: Sh_b({a,b,c}, v) = 8 Same for Sh_a = 17, Sh_c = 11

Plots

Example 1: Show the influence of one single value

s1 <-shapley(row.nr = 3, task = bh.task, model = train(makeLearner("regr.lm"), bh.task))
plot.shapley.singleValue(s1)

Example 2: Show the multifeatures influence

shap.values = shapley(1:3, task = bh.task, model = train(makeLearner("regr.lm"), bh.task))
plot.shapley.multipleFeatures(shap.values, features = c("crim","rad","tax","nox"))

Example 3: Show shapley value for multiple values

plot.shapley.multipleValues(shap.values = shap.values)

Example 4: Test convergence of shapley function for many iterations (Regression task)

test.convergence(return.value = "plot")

Convergence

Tests that the shapley algorithm converges. Parameters are: 1. row.nr Index for the observation of interest. 1. convergence.iterations Amount of iterations of the shapley function. 1. iterations Amount of the iterations within the shapley function. 1. return.value Choose between plotting results or getting a data frame ("plot", "values") Return shapley value as a data.frame with col.names and their corresponding effects. Compares the result of the shapley function with the prediction (from row number). Shows how many times a value from the shapley function was chosen.

test.convergence(row.nr = 2, convergence.iterations = 20, iterations = 20, task = mtcars.task,
                            model = train(makeLearner("cluster.kmeans"), mtcars.task),
                            return.value = "values")

The class, which is shown like this <>, is the class which was predicted by the model (also for clustering). The predicted class is the class chosen by row number. Below the classname is shown how many times this class was chosen by the shapley function, because it has the biggest shapley value.

plot the convergence test.convergence(return.value = "plot", ...) works only for regression tasks. For other tasks you can get a data frame. get a data frame test.convergence(... , return.value = "values") change amount of iterations in the shapley function test.convergence(iterations = 30, ...) change amount of calls of shapey function test.convergence(convergence.iterations = 100, ...) choose observation as reference, for example an observation that you know is normal. You cannot choose a range like in the shapley function. test.convergence(row.nr = 21, ...)

Getters

s6 <-shapley(row.nr = 3, task = bh.task, model = train(makeLearner("regr.lm"), bh.task))

getShapleyValues(s6)
getShapleyIds(s6)
getShapleyTaskType(s6)
getShapleyPredictionType(s6)
getShapleyPredictionResponse(s6)
getShapleyFeatureNames(s6)
getShapleyDataMean(s6)
getShapleySubsetByResponseClass(s6)


redichh/ShapleyR documentation built on May 28, 2019, 7:49 a.m.