findOOBErrors: Compute and locate out-of-bag prediction errors
In forestError: A Unified Framework for Random Forest Prediction Error Estimation

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/findooberrors.R

Computes each training observation's out-of-bag prediction error using the random forest and, for each tree for which the training observation is out of bag, finds the terminal node of the tree in which the training observation falls.

1	findOOBErrors(forest, X.train, Y.train = NULL, n.cores = 1)

`forest`	The random forest object being used for prediction.
`X.train`	A `matrix` or `data.frame` with the observations that were used to train `forest`. Each row should be an observation, and each column should be a predictor variable.
`Y.train`	A vector of the responses of the observations that were used to train `forest`. Required if `forest` was created using `ranger`, but not if `forest` was created using `randomForest`, `randomForestSRC`, or `quantregForest`.
`n.cores`	Number of cores to use (for parallel computation in `ranger`).

This function accepts classification or regression random forests built using the randomForest, ranger, randomForestSRC, and quantregForest packages. When training the random forest using randomForest, ranger, or quantregForest, keep.inbag must be set to TRUE. When training the random forest using randomForestSRC, membership must be set to TRUE.

A data.table with the following three columns:

`tree`	The ID of the tree of the random forest
`terminal_node`	The ID of the terminal node of the tree
`node_errs`	A vector of the out-of-bag prediction errors that fall within the terminal node of the tree

Benjamin Lu <b.lu@berkeley.edu>; Johanna Hardin <jo.hardin@pomona.edu>

quantForestError

# load data
data(airquality)

# remove observations with missing predictor variable values
airquality <- airquality[complete.cases(airquality), ]

# get number of observations and the response column index
n <- nrow(airquality)
response.col <- 1

# split data into training and test sets
train.ind <- sample(1:n, n * 0.9, replace = FALSE)
Xtrain <- airquality[train.ind, -response.col]
Ytrain <- airquality[train.ind, response.col]
Xtest <- airquality[-train.ind, -response.col]

# fit random forest to the training data
rf <- randomForest::randomForest(Xtrain, Ytrain, nodesize = 5,
                                 ntree = 500, keep.inbag = TRUE)

# compute out-of-bag prediction errors and locate each
# training observation in the trees for which it is out
# of bag
train_nodes <- findOOBErrors(rf, Xtrain)

# estimate conditional mean squared prediction errors,
# biases, prediction intervals, and error distribution
# functions for the test observations. provide
# train_nodes to avoid recomputing that step.
output <- quantForestError(rf, Xtrain, Xtest,
                           train_nodes = train_nodes)