Description Usage Arguments Value Author(s) See Also Examples
Passes a feature matrix (and optionally a label vector) through a fitted random forest
object, records size (and Gini impurity) of each node. Optionally,
for every node, returns the features used to define the rule and the data
points falling in that node. Uses mclapply
function to
distribute computation across available cores.
1 2 | readForest(rfobj, x, y=NULL, return.node.feature=TRUE,
wt.pred.accuracy=FALSE, n.core=1)
|
rfobj |
a fitted |
x |
numeric matrix with the same number of predictors used in
|
y |
a numeric vector specifying response values |
return.node.feature |
if TRUE, returns a matrix containing features used to define the decision rule associated with a node |
wt.pred.accuracy |
Should leaf nodes be sampled proportional to both size and decrease in variabiliy of responses? |
n.core |
number of cores across which tree reading will be distributed |
A list containing the following items:
tree.info |
a data frame with number of rows equal to total number of
nodes in the forest, giving node level attributes:
|
node.feature |
if return.node.feature = TRUE, returns a sparse matrix with ncol(x) columns, each row corresponding to a leaf node in the forest. The entries indicate which features were used to define the decision rule associated with a node |
Sumanta Basu sumbose@berkeley.edu, Karl Kumbier kkumbier@berkeley.edu
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | n = 50; p = 10
X = array(rnorm(n*p), c(n, p))
Y = (X[,1]>0.35 & X[,2]>0.35)|(X[,5]>0.35 & X[,7]>0.35)
Y = as.factor(as.numeric(Y>0))
train.id = 1:(n/2)
test.id = setdiff(1:n, train.id)
rf <- randomForest(x=X, y=Y, keep.forest=TRUE, track.nodes=TRUE,
ntree=100)
rforest <- readForest(rfobj=rf, x=X, n.core=2)
head(rforest$tree_info)
# count number of leaf nodes with at least 5 observations
sum(rforest$tree.info$size.node > 5)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.