In katokohaku/featureTweakR: Calculate Actionable Feature Tweaking

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE
)

Overview

An R-package to visualize suggestion how to change variables of an instance to get the desired prediction based on ensemble tree model such as randomForest.

Preparation

set.seed(777)
require(tidyverse)
require(randomForest)
require(featureTweakR)

data(spam, package = "kernlab")
dataset <- sample_frac(spam) %>% dataSplit(test.ratio = 0.1)

important.var <- c("charExclamation", "charDollar", "remove", "free", "capitalAve", "capitalLong", "your", "hp")
data.train <- dataset$train %>% select(important.var)
true.y     <- dataset$train[ ,ncol(dataset$train)]
data.test  <- dataset$test  %>% select(important.var) %>% head(50)

After data preparation, just call wrapper function:

learnModel() to extract rules from ensemble trees,
predict() to estimate suggestion for each instance from extracted rules,
plot() to visualize suggestion or poplation based importances.

Extract rules

es <- learnModel(X.train = data.train, true.y = true.y, ntree = 25)

Estimate suggestions for new instances

Based on learnt model, new instances that were predicted label.from will be suggested how to tweaked

ft <- predict(es, newdata = data.test, 
              label.from = "spam", label.to = "nonspam")

Visualize suggestion

To visualize predicted-population based feature importance, set type = "direction".

plot(ft, type = "direction")

To visualize suggestion how to change variables of k-th instance to get the desired prediction, set k = ...

plot(ft, k=4)

Details

Installation

You can install the featureTweakR package from GitHub.

 # if you have not installed "devtools" package
install.packages("devtools")
 # if you have not installed "pforeach" package
devtools::install_github("hoxo-m/pforeach")

devtools::install_github("katokohaku/featureTweakR")

The source code for featureTweakR package is available on GitHub at - https://github.com/katokohaku/featureTweakR.

data preparation

set.seed(777)

data(spam, package = "kernlab")
dataset <- sample_frac(spam)
n.test <- floor(NROW(dataset) *0.1)

dataset.train <- chop(dataset, n.test)
dataset.test  <- tail(dataset, n.test)

dim(dataset);dim(dataset.train);dim(dataset.test)

exploring randomForest

build randomForest

To view variable importances and number of trees required.

X <- dataset.train[, 1:(ncol(dataset.train)-1)]
true.y <- dataset.train[, ncol(dataset.train)]

forest.all <- randomForest(X, true.y, ntree=500)
forest.all
par(mfrow=c(1,2))
varImpPlot(forest.all) # to view varImp, x3 & x4 should be removed.
plot(forest.all)
par(mfrow=c(1,1))

model shrinkage (feature selection) based on importance

top.importance <- forest.all$importance %>% data.frame %>%
  tibble::rownames_to_column(var = "var") %>% 
  arrange(desc(MeanDecreaseGini)) %>% 
  head(12)
top.importance

dataset.train.fs <- dataset.train %>% select(top.importance$var)
dataset.test.fs  <- dataset.test %>% select(top.importance$var)

scaling feature-selected data

X.train <- scale( dataset.train.fs )
X.test  <- rescale( dataset.test.fs, scaled = X.train )

dataset.test.fs[1:6, 1:6]
descale(X.test, scaled = X.train)[1:6, 1:6]
descale(X.test, scaled = X.test)[1:6, 1:6]

performance comparison forest with all-feature v.s. selected-features

forest.rf <- randomForest(X.train, true.y, ntree=100)

forest.all
forest.rf
plot(forest.rf)

Step-by-step procedure

After build forest, steps to obtain suggestions without wrapper functios are:

rule extraction
get modified(tweaked) rules
get the best tweaked rule (the suggestion) for each instance
restore suggestion to real scale.
visualize

extract rules

rules <- getRules(forest.rf, ktree = NULL, resample = TRUE)
# rules[[1]]

set modified rules (e-satisfactory instances)

es.rf <- set.eSatisfactory(rules, epsiron = 0.3)
# es.rf[[1]]

predict individual suggestion for each instance

tweaked <- tweak(es.rf, forest.rf, newdata= X.test, 
                 label.from = "spam", label.to = "nonspam", .dopar = TRUE)

str(tweaked,1,vec.len = 2)

restore suggestion from scaled feature to original scale.

dt <- descale.tweakedFeature(tweaked, X.test)

Visualize suggestion

To visualize predicted-population based feature importance, set type = "direction".

plot(tweaked, type = "direction")

To visualize suggestion how to change variables of k-th instance to get the desired prediction, set k = ...

plotSuggest(tweaked, k=4)

To view only non-zero variable or sorted values, set .ordered = TRUE``.nonzero.only = TRUE, respectively.

plotSuggest(tweaked, k=4, .ordered = TRUE, .nonzero.only = TRUE)

References

Gabriele Tolomei, Fabrizio Silvestri, Andrew Haines, Mounia Lalmas "Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking". KDD 2017 or arXiv paper

katokohaku/featureTweakR documentation built on May 17, 2019, 11:17 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

katokohaku/featureTweakR
Calculate Actionable Feature Tweaking

In katokohaku/featureTweakR: Calculate Actionable Feature Tweaking

Overview

Preparation

Extract rules

Estimate suggestions for new instances

Visualize suggestion

Details

Installation

data preparation

exploring randomForest

build randomForest

model shrinkage (feature selection) based on importance

scaling feature-selected data

performance comparison forest with all-feature v.s. selected-features

Step-by-step procedure

extract rules

set modified rules (e-satisfactory instances)

predict individual suggestion for each instance

restore suggestion from scaled feature to original scale.

Visualize suggestion

References

R Package Documentation

Browse R Packages

We want your feedback!

katokohaku/featureTweakR Calculate Actionable Feature Tweaking

In katokohaku/featureTweakR: Calculate Actionable Feature Tweaking

Overview

Preparation

Extract rules

Estimate suggestions for new instances

Visualize suggestion

Details

Installation

data preparation

exploring randomForest

build randomForest

model shrinkage (feature selection) based on importance

scaling feature-selected data

performance comparison forest with all-feature v.s. selected-features

Step-by-step procedure

extract rules

set modified rules (e-satisfactory instances)

predict individual suggestion for each instance

restore suggestion from scaled feature to original scale.

Visualize suggestion

References

R Package Documentation

Browse R Packages

We want your feedback!

katokohaku/featureTweakR
Calculate Actionable Feature Tweaking