featureContrib: Feature Contribution
In nalzok/tree.interpreter: Random Forest Prediction Decomposition and Feature Importance Measure

Description Usage Arguments Details Value Functions References See Also Examples

View source: R/featureContrib.R

Contribution of each feature to the prediction.

1
2
3

featureContribTree(tidy.RF, tree, X)

featureContrib(tidy.RF, X)

`tidy.RF`	A tidy random forest. The random forest to make predictions with.
`tree`	An integer. The index of the tree to look at.
`X`	A data frame. Features of samples to be predicted.

Recall that each node in a decision tree has a prediction associated with it. For regression trees, it's the average response in that node, whereas in classification trees, it's the frequency of each response class, or the most frequent response class in that node.

For a tree in the forest, the contribution of each feature to the prediction of a sample is the sum of differences between the predictions of nodes which split on the feature and those of their children, i.e. the sum of changes in node prediction caused by spliting on the feature. This is the calculated by featureContribTree.

For a forest, the contribution of each feature to the prediction if a sample is the average contribution across all trees in the forest. This is because the prediction of a forest is the average of the predictions of its trees. This is calculated by featureContrib.

Together with trainsetBias(Tree), they can decompose the prediction by feature importance:

prediction(MODEL, X) = trainsetBias(MODEL) + featureContrib_1(MODEL, X) + ... + featureContrib_p(MODEL, X),

where MODEL can be either a tree or a forest.

A cube (3D array). The content depends on the type of the response.

Regression: A P-by-1-by-N array, where P is the number of features in X, and N the number of samples in X. The pth row of the nth slice stands for the contribution of feature p to the prediction for response n.
Classification: A P-by-D-by-N array, where P is the number of features in X, D is the number of response classes, and N is the number of samples in X. The pth row of the nth slice stands for the contribution of feature p to the prediction of each response class for response n.

featureContribTree: Feature contribution to prediction within a single tree
featureContrib: Feature contribution to prediction within the whole forest

Interpreting random forests http://blog.datadive.net/interpreting-random-forests/

Random forest interpretation with scikit-learn http://blog.datadive.net/random-forest-interpretation-with-scikit-learn/

trainsetBias

MDI

library(ranger)
test.id <- 50 * seq(3)
rfobj <- ranger(Species ~ ., iris[-test.id, ], keep.inbag=TRUE)
tidy.RF <- tidyRF(rfobj, iris[-test.id, -5], iris[-test.id, 5])
featureContribTree(tidy.RF, 1, iris[test.id, -5])
featureContrib(tidy.RF, iris[test.id, -5])