findOOBErrors: Compute and locate out-of-bag prediction errors

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/findooberrors.R

Description

Computes each training observation's out-of-bag prediction error using the random forest and, for each tree for which the training observation is out of bag, finds the terminal node of the tree in which the training observation falls.

Usage

1
findOOBErrors(forest, X.train, Y.train = NULL, n.cores = 1)

Arguments

forest

The random forest object being used for prediction.

X.train

A matrix or data.frame with the observations that were used to train forest. Each row should be an observation, and each column should be a predictor variable.

Y.train

A vector of the responses of the observations that were used to train forest. Required if forest was created using ranger, but not if forest was created using randomForest, randomForestSRC, or quantregForest.

n.cores

Number of cores to use (for parallel computation in ranger).

Details

This function accepts classification or regression random forests built using the randomForest, ranger, randomForestSRC, and quantregForest packages. When training the random forest using randomForest, ranger, or quantregForest, keep.inbag must be set to TRUE. When training the random forest using randomForestSRC, membership must be set to TRUE.

Value

A data.table with the following three columns:

tree

The ID of the tree of the random forest

terminal_node

The ID of the terminal node of the tree

node_errs

A vector of the out-of-bag prediction errors that fall within the terminal node of the tree

Author(s)

Benjamin Lu <b.lu@berkeley.edu>; Johanna Hardin <jo.hardin@pomona.edu>

See Also

quantForestError

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# load data
data(airquality)

# remove observations with missing predictor variable values
airquality <- airquality[complete.cases(airquality), ]

# get number of observations and the response column index
n <- nrow(airquality)
response.col <- 1

# split data into training and test sets
train.ind <- sample(1:n, n * 0.9, replace = FALSE)
Xtrain <- airquality[train.ind, -response.col]
Ytrain <- airquality[train.ind, response.col]
Xtest <- airquality[-train.ind, -response.col]

# fit random forest to the training data
rf <- randomForest::randomForest(Xtrain, Ytrain, nodesize = 5,
                                 ntree = 500, keep.inbag = TRUE)

# compute out-of-bag prediction errors and locate each
# training observation in the trees for which it is out
# of bag
train_nodes <- findOOBErrors(rf, Xtrain)

# estimate conditional mean squared prediction errors,
# biases, prediction intervals, and error distribution
# functions for the test observations. provide
# train_nodes to avoid recomputing that step.
output <- quantForestError(rf, Xtrain, Xtest,
                           train_nodes = train_nodes)

forestError documentation built on Aug. 11, 2021, 1:06 a.m.