readForest: Pass data through a fitted forest, record node...

Description Usage Arguments Value Author(s) See Also Examples

View source: R/readForest.R

Description

Passes a feature matrix (and optionally a label vector) through a fitted random forest object, records size (and Gini impurity) of each node. Optionally, for every node, returns the features used to define the rule and the data points falling in that node. Uses mclapply function to distribute computation across available cores.

Usage

1
2
  readForest(rfobj, x, y=NULL, return.node.feature=TRUE, 
    wt.pred.accuracy=FALSE, n.core=1)

Arguments

rfobj

a fitted randomForest object with the forest component in it

x

numeric matrix with the same number of predictors used in rfobj fit

y

a numeric vector specifying response values

return.node.feature

if TRUE, returns a matrix containing features used to define the decision rule associated with a node

wt.pred.accuracy

Should leaf nodes be sampled proportional to both size and decrease in variabiliy of responses?

n.core

number of cores across which tree reading will be distributed

Value

A list containing the following items:

tree.info

a data frame with number of rows equal to total number of nodes in the forest, giving node level attributes: prediction (predicted response of leaf node), node.idx (the forest level node index of the leaf), parent (index of parent node), size.node (number of data points falling in a node), tree (index of the tree in the forest in which the node lives), dec.purity (if wt.pred.accuracy=TRUE, the decrease in standard deviation of responses relative to the full data).

node.feature

if return.node.feature = TRUE, returns a sparse matrix with ncol(x) columns, each row corresponding to a leaf node in the forest. The entries indicate which features were used to define the decision rule associated with a node

Author(s)

Sumanta Basu sumbose@berkeley.edu, Karl Kumbier kkumbier@berkeley.edu

See Also

getTree

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
  n = 50; p = 10
  X = array(rnorm(n*p), c(n, p))
  Y = (X[,1]>0.35 & X[,2]>0.35)|(X[,5]>0.35 & X[,7]>0.35)
  Y = as.factor(as.numeric(Y>0))

  train.id = 1:(n/2)
  test.id = setdiff(1:n, train.id)
  
  rf <- randomForest(x=X, y=Y, keep.forest=TRUE, track.nodes=TRUE,
    ntree=100)
  rforest <- readForest(rfobj=rf, x=X, n.core=2)
  head(rforest$tree_info)

 # count number of leaf nodes with at least 5 observations
  sum(rforest$tree.info$size.node > 5)

iRF documentation built on May 2, 2019, 11:02 a.m.