Description Usage Arguments Details Value Author(s) See Also Examples
View source: R/quantforesterror.R
Estimates the conditional misclassification rates, conditional mean squared prediction errors, conditional biases, conditional prediction intervals, and conditional error distributions of random forest predictions.
1 2 3 4 5 6 7 8 9 10 11 12 
forest 
The random forest object being used for prediction. 
X.train 
A 
X.test 
A 
Y.train 
A vector of the responses of the observations that were used
to train 
what 
A vector of characters indicating what estimates are desired.
Possible options are conditional mean squared prediction errors ( 
alpha 
A vector of typeI error rates desired for the conditional prediction
intervals; required if 
train_nodes 
A 
return_train_nodes 
A boolean indicating whether to return the

n.cores 
Number of cores to use (for parallel computation in 
This function accepts classification or regression random forests built using
the randomForest
, ranger
, randomForestSRC
, and
quantregForest
packages. When training the random forest using
randomForest
, ranger
, or quantregForest
, keep.inbag
must be set to TRUE
. When training the random forest using
randomForestSRC
, membership
must be set to TRUE
.
The predictions computed by ranger
can be parallelized by setting the
value of n.cores
to be greater than 1.
The random forest predictions are always returned as a data.frame
. Additional
columns are included in the data.frame
depending on the user's selections in
the argument what
. In particular, including "mspe"
in what
will add an additional column with the conditional mean squared prediction
error of each test prediction to the data.frame
; including "bias"
in
what
will add an additional column with the conditional bias of each test
prediction to the data.frame
; including "interval"
in what
will add to the data.frame
additional columns with the lower and
upper bounds of conditional prediction intervals for each test prediction;
and including "mcr"
in what
will add an additional column with
the conditional misclassification rate of each test prediction to the
data.frame
. The conditional misclassification rate can be estimated
only for classification random forests, while the other parameters can be
estimated only for regression random forests.
If "p.error"
or "q.error"
is included in what
, or if
return_train_nodes
is set to TRUE
, then a list will be returned
as output. The first element of the list, named "estimates"
, is the
data.frame
described in the above paragraph. The other elements of the
list are the estimated cumulative distribution functions (perror
) of
the conditional error distributions, the estimated quantile functions
(qerror
) of the conditional error distributions, and/or a data.table
indicating what outofbag prediction errors each terminal node of each tree
in the random forest contains.
A data.frame
with one or more of the following columns, as described
in the details section:
pred 
The random forest predictions of the test observations 
mspe 
The estimated conditional mean squared prediction errors of the random forest predictions 
bias 
The estimated conditional biases of the random forest predictions 
lower_alpha 
The estimated lower bounds of the conditional alphalevel prediction intervals for the test observations 
upper_alpha 
The estimated upper bounds of the conditional alphalevel prediction intervals for the test observations 
mcr 
The estimated conditional misclassification rate of the random forest predictions 
In addition, one or both of the following functions, as described in the details section:
perror 
The estimated cumulative distribution functions of the conditional error distributions associated with the test predictions 
qerror 
The estimated quantile functions of the conditional error distributions associated with the test predictions 
In addition, if return_train_nodes
is TRUE
, then a data.table
called train_nodes
indicating what outofbag prediction errors each
terminal node of each tree in forest
contains.
Benjamin Lu <b.lu@berkeley.edu>
; Johanna Hardin <jo.hardin@pomona.edu>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50  # load data
data(airquality)
# remove observations with missing predictor variable values
airquality < airquality[complete.cases(airquality), ]
# get number of observations and the response column index
n < nrow(airquality)
response.col < 1
# split data into training and test sets
train.ind < sample(c("A", "B", "C"), n,
replace = TRUE, prob = c(0.8, 0.1, 0.1))
Xtrain < airquality[train.ind == "A", response.col]
Ytrain < airquality[train.ind == "A", response.col]
Xtest1 < airquality[train.ind == "B", response.col]
Xtest2 < airquality[train.ind == "C", response.col]
# fit regression random forest to the training data
rf < randomForest::randomForest(Xtrain, Ytrain, nodesize = 5,
ntree = 500,
keep.inbag = TRUE)
# estimate conditional mean squared prediction errors,
# biases, prediction intervals, and error distribution
# functions for the observations in Xtest1. return
# train_nodes to avoid recomputation in the next
# line of code.
output1 < quantForestError(rf, Xtrain, Xtest1,
return_train_nodes = TRUE)
# estimate just the conditional mean squared prediction errors
# and prediction intervals for the observations in Xtest2.
# avoid recomputation by providing train_nodes from the
# previous line of code.
output2 < quantForestError(rf, Xtrain, Xtest2,
what = c("mspe", "interval"),
train_nodes = output1$train_nodes)
# for illustrative purposes, convert response to categorical
Ytrain < as.factor(Ytrain > 31.5)
# fit classification random forest to the training data
rf < randomForest::randomForest(Xtrain, Ytrain, nodesize = 3,
ntree = 500,
keep.inbag = TRUE)
# estimate conditional misclassification rate of the
# predictions of Xtest1
output < quantForestError(rf, Xtrain, Xtest1)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.