knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE
)

Overview

An R-package to visualize suggestion how to change variables of an instance to get the desired prediction based on ensemble tree model such as randomForest.

Preparation

set.seed(777)
require(tidyverse)
require(randomForest)
require(featureTweakR)

data(spam, package = "kernlab")
dataset <- sample_frac(spam) %>% dataSplit(test.ratio = 0.1)

important.var <- c("charExclamation", "charDollar", "remove", "free", "capitalAve", "capitalLong", "your", "hp")
data.train <- dataset$train %>% select(important.var)
true.y     <- dataset$train[ ,ncol(dataset$train)]
data.test  <- dataset$test  %>% select(important.var) %>% head(50)

After data preparation, just call wrapper function:

  1. learnModel() to extract rules from ensemble trees,
  2. predict() to estimate suggestion for each instance from extracted rules,
  3. plot() to visualize suggestion or poplation based importances.

Extract rules

es <- learnModel(X.train = data.train, true.y = true.y, ntree = 25)

Estimate suggestions for new instances

Based on learnt model, new instances that were predicted label.from will be suggested how to tweaked

ft <- predict(es, newdata = data.test, 
              label.from = "spam", label.to = "nonspam")

Visualize suggestion

To visualize predicted-population based feature importance, set type = "direction".

plot(ft, type = "direction")

To visualize suggestion how to change variables of k-th instance to get the desired prediction, set k = ...

plot(ft, k=4)

Details

Installation

You can install the featureTweakR package from GitHub.

 # if you have not installed "devtools" package
install.packages("devtools")
 # if you have not installed "pforeach" package
devtools::install_github("hoxo-m/pforeach")

devtools::install_github("katokohaku/featureTweakR")

The source code for featureTweakR package is available on GitHub at - https://github.com/katokohaku/featureTweakR.

data preparation

set.seed(777)

data(spam, package = "kernlab")
dataset <- sample_frac(spam)
n.test <- floor(NROW(dataset) *0.1)

dataset.train <- chop(dataset, n.test)
dataset.test  <- tail(dataset, n.test)

dim(dataset);dim(dataset.train);dim(dataset.test)

exploring randomForest

build randomForest

To view variable importances and number of trees required.

X <- dataset.train[, 1:(ncol(dataset.train)-1)]
true.y <- dataset.train[, ncol(dataset.train)]

forest.all <- randomForest(X, true.y, ntree=500)
forest.all
par(mfrow=c(1,2))
varImpPlot(forest.all) # to view varImp, x3 & x4 should be removed.
plot(forest.all)
par(mfrow=c(1,1))

model shrinkage (feature selection) based on importance

top.importance <- forest.all$importance %>% data.frame %>%
  tibble::rownames_to_column(var = "var") %>% 
  arrange(desc(MeanDecreaseGini)) %>% 
  head(12)
top.importance

dataset.train.fs <- dataset.train %>% select(top.importance$var)
dataset.test.fs  <- dataset.test %>% select(top.importance$var)

scaling feature-selected data

X.train <- scale( dataset.train.fs )
X.test  <- rescale( dataset.test.fs, scaled = X.train )

dataset.test.fs[1:6, 1:6]
descale(X.test, scaled = X.train)[1:6, 1:6]
descale(X.test, scaled = X.test)[1:6, 1:6]

performance comparison forest with all-feature v.s. selected-features

forest.rf <- randomForest(X.train, true.y, ntree=100)

forest.all
forest.rf
plot(forest.rf)

Step-by-step procedure

After build forest, steps to obtain suggestions without wrapper functios are:

  1. rule extraction
  2. get modified(tweaked) rules
  3. get the best tweaked rule (the suggestion) for each instance
  4. restore suggestion to real scale.
  5. visualize

extract rules

rules <- getRules(forest.rf, ktree = NULL, resample = TRUE)
# rules[[1]]

set modified rules (e-satisfactory instances)

es.rf <- set.eSatisfactory(rules, epsiron = 0.3)
# es.rf[[1]]

predict individual suggestion for each instance

tweaked <- tweak(es.rf, forest.rf, newdata= X.test, 
                 label.from = "spam", label.to = "nonspam", .dopar = TRUE)

str(tweaked,1,vec.len = 2)

restore suggestion from scaled feature to original scale.

dt <- descale.tweakedFeature(tweaked, X.test)

Visualize suggestion

To visualize predicted-population based feature importance, set type = "direction".

plot(tweaked, type = "direction")

To visualize suggestion how to change variables of k-th instance to get the desired prediction, set k = ...

plotSuggest(tweaked, k=4)

To view only non-zero variable or sorted values, set .ordered = TRUE``.nonzero.only = TRUE, respectively.

plotSuggest(tweaked, k=4, .ordered = TRUE, .nonzero.only = TRUE)

References

Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, Mounia Lalmas "Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking". KDD 2017 or arXiv paper



katokohaku/featureTweakR documentation built on May 17, 2019, 11:17 p.m.