pibf | R Documentation |
Constructs prediction intervals with boosted forests.
pibf( formula, traindata, testdata, alpha = 0.05, calibration = c("cv", "oob", FALSE), coverage_range = c(1 - alpha - 0.005, 1 - alpha + 0.005), numfolds = 5, params_ranger = list(num.trees = 2000, mtry = ceiling(px/3), min.node.size = 5, replace = TRUE), oob = FALSE )
formula |
Object of class |
traindata |
Training data of class |
testdata |
Test data of class |
alpha |
Confidence level. (1 - |
calibration |
Calibration method for finding working level of
|
coverage_range |
The allowed target calibration range for coverage level.
α_w is selected such that the |
numfolds |
Number of folds for calibration with cross-validation. The default is 5 folds. |
params_ranger |
List of parameters that should be passed to
|
oob |
Should out-of-bag (OOB) predictions and prediction intervals for the training observations be returned? |
A list with the following components:
pred_interval |
Prediction intervals for test data. A list containing lower and upper bounds. |
test_pred |
Bias-corrected random forest predictions for test data. |
alphaw |
Working level of |
test_response |
If available, test response. |
oob_pred_interval |
Out-of-bag (OOB) prediction intervals for train
data. Prediction intervals are built with |
oob_pred |
Bias-corrected out-of-bag (OOB) predictions for train data.
If |
train_response |
Train response. |
Calibration process
Let (1-α) be the target coverage level. The goal of the calibration is to find the value of α_w, which is the working level of α called by Roy and Larocque (2020), such that the coverage level of the PIs for the training observations is closest to the target coverage level. Two calibration procedures are provided: calibration with cross-validation and out-of-bag (OOB) calibration.
In calibration with CV, we apply k-fold cross-validation to form
prediction intervals for the training observations. In each fold, we split
the original training data set into training and testing sets. For the
training set, we train a one-step boosted random forest and compute the OOB
residuals. Then, for each observation in the testing set, we build a PI.
After completing CV, we compute the coverage level with the constructed PIs
and if the coverage is not within the acceptable coverage range
(coverage_range
), then we apply a grid search to find the
α_w such that α_w is the closest to the target
α among the set of α_w's that ensures the target
coverage level for the constructed PIs. Once we find the α_w, we
use this level to build the PI for the new observations.
The OOB calibration procedure is proposed by Roy and Larocque (2020)
and it is the default calibration procedure of rfpi()
. See details
section of rfpi()
for the detailed explanation of this calibration
procedure.
In terms of computational time, OOB calibration is faster than calibration with CV. However, empirical results show that OOB calibration may result in conservative prediction intervals. Therefore, the recommended calibration procedure for the PIBF method is calibration with CV.
Alakus, C., Larocque, D., and Labbe, A. (2021). RFpredInterval: An R Package for Prediction Intervals with Random Forests and Boosted Forests. arXiv preprint arXiv:2106.08217.
Roy, M. H., & Larocque, D. (2020). Prediction intervals with random forests. Statistical methods in medical research, 29(1), 205-229. doi:10.1177/0962280219829885.
piall
rfpi
print.rfpredinterval
## load example data data(BostonHousing, package = "RFpredInterval") set.seed(2345) ## define train/test split testindex <- 1:10 trainindex <- sample(11:nrow(BostonHousing), size = 100, replace = FALSE) traindata <- BostonHousing[trainindex, ] testdata <- BostonHousing[testindex, ] px <- ncol(BostonHousing) - 1 ## construct 95% PI with "cv" calibration using 5-folds out <- pibf(formula = medv ~ ., traindata = traindata, testdata = testdata, calibration = "cv", numfolds = 5, params_ranger = list(num.trees = 40)) ## get the PI for the first observation in the testdata c(out$pred_interval$lower[1], out$pred_interval$upper[1]) ## get the bias-corrected random forest predictions for testdata out$test_pred ## construct 90% PI with "oob" calibration out2 <- pibf(formula = medv ~ ., traindata = traindata, testdata = testdata, alpha = 0.1, calibration = "oob", coverage_range = c(0.89,91), params_ranger = list(num.trees = 40)) ## get the PI for the testdata out2$pred_interval ## get the working level of alpha (alphaw) out2$alphaw
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.