library(ggplot2) #library(shapleyr) #load the packages that we need devtools::load_all() knitr::opts_chunk$set(echo = TRUE) knitr::opts_chunk$set(collapse = T, comment = "#>")
The Shapley value is a method that gives a solution to the following problem: A coalition of players play a game, for which they get a payout, but it is not clear how to distribute the payout fairly among the players. The Shapley value solves this problem by trying out different coalitions of players to decide how much each player changes the amount of the payout. What does this have to do with machine learning? In machine learning, the features (=players) work together to get the payout (=predicted value). The Shapley value tells us, how much each feature contributed to the prediction. Properties: 1. * Pareto-efficiency. 1. * Symmetry. * Additivity. * Zero player.
The Shapley function gives you as output many informations like task type, feature names, predict type, prediction response, mean of data and the shapley values for every feature. You get the information by using the get functions, which are described below (Getters). row.nr = 3 is just an example. You can also choose a range like row.nr = 1:5
# task = bh.task, predict.methods = regr.lm s1 <-shapley(row.nr = 3, task = bh.task, model = train(makeLearner("regr.lm"), bh.task)) # Here you can see the whole output: s1 # Here we use getShapleyValues(s1) to show only the shapley values. In the following examples we show only the shapley values. knitr::kable(getShapleyValues(s1))
# task = iris.task, predict.methods = classif.lda s2 <-shapley(row.nr = 3, task = iris.task, model = train(makeLearner("classif.lda"), iris.task)) # shapley Value knitr::kable(getShapleyValues(s2))
# task = yeast.task, predict.methods = multilabel.rFerns s3 <-shapley(row.nr = 3, task = yeast.task, model = train(makeLearner("multilabel.rFerns"), yeast.task)) # shapley Value knitr::kable(getShapleyValues(s3))
# task = mtcars.task, predict.methods = cluster.kmeans s4 <-shapley(row.nr = 5, task = mtcars.task, model = train(makeLearner("cluster.kmeans"), mtcars.task)) # shapley Value knitr::kable(getShapleyValues(s4))
Unsampled version created for calculating the exact shapley value. Every permutation needs to be calculated. If you have 5 features/ players, it takes 2^5=32 permutations. In the test.data you can see that every coalation (represented with True/False, respectively 1/0) must have a target-value for the coalation. With the parameter target you choose the column of the result of the coalationfunction.
test.data = as.data.frame(rbind(c(1,0,0,12), c(0,1,0,6), c(0,0,1,9), c(1,1,0,24), c(1,0,1,27), c(0,1,1,15), c(0,0,0,0), c(1,1,1,36))) names(test.data) = c("Ha", "He", "Da", "value") s5 <-shapley.unsampled(data.input = test.data, target = "value") s5
An example how to calculate the Shapley value of feature b:
Permutationen Beitrag vor b vor und mit b Marginaler Beitrag ------------- -------------- -------------- ------------------- a,b,c v({a}) = 12 v({a,b})=24 12 = 24-12 a,c,b v({a,c}) = 27 v({a,b,c}) = 36 9 = 36-27 b,a,c v({}) = 0 v({b}) = 6 6 b,c,a v({}) = 0 v({b}) = 6 6 c,a,b v({a,c}) = 27 v({a,b,c}) = 36 9 c,b,a v({c}) = 9 v({b,c}) = 15 6 ------------- -------------- -------------- -------------------- Result: Sh_b({a,b,c}, v) = 8 Same for Sh_a = 17, Sh_c = 11
s1 <-shapley(row.nr = 3, task = bh.task, model = train(makeLearner("regr.lm"), bh.task)) plot.shapley.singleValue(s1)
shap.values = shapley(1:3, task = bh.task, model = train(makeLearner("regr.lm"), bh.task)) plot.shapley.multipleFeatures(shap.values, features = c("crim","rad","tax","nox"))
plot.shapley.multipleValues(shap.values = shap.values)
test.convergence(return.value = "plot")
Tests that the shapley algorithm converges. Parameters are: 1. row.nr Index for the observation of interest. 1. convergence.iterations Amount of iterations of the shapley function. 1. iterations Amount of the iterations within the shapley function. 1. return.value Choose between plotting results or getting a data frame ("plot", "values") Return shapley value as a data.frame with col.names and their corresponding effects. Compares the result of the shapley function with the prediction (from row number). Shows how many times a value from the shapley function was chosen.
test.convergence(row.nr = 2, convergence.iterations = 20, iterations = 20, task = mtcars.task, model = train(makeLearner("cluster.kmeans"), mtcars.task), return.value = "values")
The class, which is shown like this <
plot the convergence test.convergence(return.value = "plot", ...) works only for regression tasks. For other tasks you can get a data frame. get a data frame test.convergence(... , return.value = "values") change amount of iterations in the shapley function test.convergence(iterations = 30, ...) change amount of calls of shapey function test.convergence(convergence.iterations = 100, ...) choose observation as reference, for example an observation that you know is normal. You cannot choose a range like in the shapley function. test.convergence(row.nr = 21, ...)
s6 <-shapley(row.nr = 3, task = bh.task, model = train(makeLearner("regr.lm"), bh.task)) getShapleyValues(s6) getShapleyIds(s6) getShapleyTaskType(s6) getShapleyPredictionType(s6) getShapleyPredictionResponse(s6) getShapleyFeatureNames(s6) getShapleyDataMean(s6) getShapleySubsetByResponseClass(s6)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.