getLocalIncrements: Get Local Increments of Feature Contributions for a Random...

Description Usage Arguments Value Author(s) References See Also Examples

Description

This method calculates local increments of feature contributions from an existing randomForest model. This method was implemented based upon the approach of Kuz'min et al. for regression models and extended to classification models. The method does not work for unsupervised models. The randomForest model must have a stored in-bag matrix that keeps track of which samples were used to build trees in the forest and sampling without replacement must be used to generate a model. Hence, all Random Forest models analyzed by getLocalIncrements() and, subsequently, featureContributions(), must be generated as follows: model <- randomForest(...,keep.inbag=TRUE,replace=FALSE) The reason for this current limitation is because, in the code of the randomForest implementation of Random Forest provided by Liaw and Wiener, the inbag matrix does not record how many times a sample was used to build a particular tree (if sampling with replacement). The method returns local increments for all nodes in each tree for regression and binary classification models. In case of multi-classification problems the method returns the local increments calculated for all classes for every tree node in the forest.

Usage

1
getLocalIncrements(object,  dataT, binAsReg=TRUE, mcls=NULL)

Arguments

object

an object of the class randomForest

dataT

a data frame containing the variables in the model for all instances for which feature contributions are desired

binAsReg

this option is only relevant for binary classification. If TRUE (default), the binary classification model is treated like a regression model,for the purpose of calculating feature contributions, with the class labels treated as numeric values of 1 or 0. If FALSE, only the local increments in favour of the predicted class (for the forest as a whole) are calculated - as per the treatment of multi-class classifiers.

mcls

main class that be set to "1" for binary classification. If NULL, the class name from the first record in dataT will be set as "1", otherwhise the provided class will be map to "1".

Value

A list with the following components:

type

the type of the method used for calculating local increments of feature contributions

forest

If a multi-class classification model, or a binary classification model analyzed using the binAsReg=FALSE option, has been analyzed, this is a list that contains: a vector lIncrements of local increments for all classes and each node of each tree, and a k x ntree matrix rmv of the mean proportion of instances in each class in the root nodes, where k is the number of classes and ntree is the number of trees in the forest. If a regression model,or a binary classification model analyzed using the binAsReg=TRUE option, has been analyzed, this is this is a list that contains: a vector lIncrements of local increments for all classes and each node of each tree, and another vector, of length ntree, rmv of the mean activity (with the two classes treated as numeric values of 1 or 0 in the case of binary classification) of instances in the root nodes.

Author(s)

Anna Palczewska annawojak@gmail.com and
Richard Marchese Robinson rmarcheserobinson@gmail.com

References

V.E. Kuz'min et al. (2011). Interpretation of QSAR Models Based on Random Forest Methods, Molecular Informatics, 30, 593-603.
A. Palczewska et al. (2013), Interpreting random forest models using a feature contribution method, Proceedings of the 2013 IEEE 14th International Conference on Information Reuse and Integration IEEE IRI 2013, August 14-16, 2013, San Francisco, California, USA, 112-119.
A. Palczewska et al. (2014), Interpreting random forest classification models using a feature contribution method. in Integration of Reusable Systems, ser. Advances in Intelligent and Soft Computing, T. Bouabana-Tebibel and S. H. Rubin, Eds. Springer International Publishing, 263, 193-218.

See Also

randomForest

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
 ## Not run: 
#Binary classification 
library(randomForest)
data(ames)
ames_train<-ames[ames$Type=="Train",-c(1,3, ncol(ames))]
rF_Model <- randomForest(x=ames_train[,-1],y=as.factor(as.character(ames_train[,1])),
         ntree=500,importance=TRUE, keep.inbag=TRUE,replace=FALSE) 
li <- getLocalIncrements(rF_Model,ames_train[,-1])

## End(Not run)

rfFC documentation built on May 2, 2019, 5:18 p.m.