knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE )
An R-package to visualize suggestion how to change variables of an instance to get the desired prediction based on ensemble tree model such as randomForest.
set.seed(777) require(tidyverse) require(randomForest) require(featureTweakR) data(spam, package = "kernlab") dataset <- sample_frac(spam) %>% dataSplit(test.ratio = 0.1) important.var <- c("charExclamation", "charDollar", "remove", "free", "capitalAve", "capitalLong", "your", "hp") data.train <- dataset$train %>% select(important.var) true.y <- dataset$train[ ,ncol(dataset$train)] data.test <- dataset$test %>% select(important.var) %>% head(50)
After data preparation, just call wrapper function:
learnModel()
to extract rules from ensemble trees,predict()
to estimate suggestion for each instance from extracted rules,plot()
to visualize suggestion or poplation based importances.es <- learnModel(X.train = data.train, true.y = true.y, ntree = 25)
Based on learnt model, new instances that were predicted label.from
will be suggested how to tweaked
ft <- predict(es, newdata = data.test, label.from = "spam", label.to = "nonspam")
To visualize predicted-population based feature importance, set type = "direction"
.
plot(ft, type = "direction")
To visualize suggestion how to change variables of k-th instance to get the desired prediction, set k = ...
plot(ft, k=4)
You can install the featureTweakR package from GitHub.
# if you have not installed "devtools" package install.packages("devtools") # if you have not installed "pforeach" package devtools::install_github("hoxo-m/pforeach") devtools::install_github("katokohaku/featureTweakR")
The source code for featureTweakR package is available on GitHub at - https://github.com/katokohaku/featureTweakR.
set.seed(777) data(spam, package = "kernlab") dataset <- sample_frac(spam) n.test <- floor(NROW(dataset) *0.1) dataset.train <- chop(dataset, n.test) dataset.test <- tail(dataset, n.test) dim(dataset);dim(dataset.train);dim(dataset.test)
To view variable importances and number of trees required.
X <- dataset.train[, 1:(ncol(dataset.train)-1)] true.y <- dataset.train[, ncol(dataset.train)] forest.all <- randomForest(X, true.y, ntree=500) forest.all par(mfrow=c(1,2)) varImpPlot(forest.all) # to view varImp, x3 & x4 should be removed. plot(forest.all) par(mfrow=c(1,1))
top.importance <- forest.all$importance %>% data.frame %>% tibble::rownames_to_column(var = "var") %>% arrange(desc(MeanDecreaseGini)) %>% head(12) top.importance dataset.train.fs <- dataset.train %>% select(top.importance$var) dataset.test.fs <- dataset.test %>% select(top.importance$var)
X.train <- scale( dataset.train.fs ) X.test <- rescale( dataset.test.fs, scaled = X.train ) dataset.test.fs[1:6, 1:6] descale(X.test, scaled = X.train)[1:6, 1:6] descale(X.test, scaled = X.test)[1:6, 1:6]
forest.rf <- randomForest(X.train, true.y, ntree=100) forest.all forest.rf plot(forest.rf)
After build forest, steps to obtain suggestions without wrapper functios are:
rules <- getRules(forest.rf, ktree = NULL, resample = TRUE) # rules[[1]]
es.rf <- set.eSatisfactory(rules, epsiron = 0.3) # es.rf[[1]]
tweaked <- tweak(es.rf, forest.rf, newdata= X.test, label.from = "spam", label.to = "nonspam", .dopar = TRUE) str(tweaked,1,vec.len = 2)
dt <- descale.tweakedFeature(tweaked, X.test)
To visualize predicted-population based feature importance, set type = "direction"
.
plot(tweaked, type = "direction")
To visualize suggestion how to change variables of k-th instance to get the desired prediction, set k = ...
plotSuggest(tweaked, k=4)
To view only non-zero variable or sorted values, set .ordered = TRUE``.nonzero.only = TRUE
, respectively.
plotSuggest(tweaked, k=4, .ordered = TRUE, .nonzero.only = TRUE)
Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, Mounia Lalmas "Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking". KDD 2017 or arXiv paper
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.