Description Usage Arguments Details Value Author(s) References See Also Examples
Implements seven different random forest prediction interval methods.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24  rfint(
formula = formula,
train_data = NULL,
test_data = NULL,
method = "Zhang",
alpha = 0.1,
symmetry = TRUE,
seed = NULL,
m_try = 2,
num_trees = 500,
min_node_size = 5,
num_threads = parallel::detectCores(),
calibrate = FALSE,
Roy_method = "quantile",
featureBias = FALSE,
predictionBias = TRUE,
Tung_R = 5,
Tung_num_trees = 75,
variant = 1,
Ghosal_num_stages = 2,
prop = 0.618,
concise = TRUE,
interval_type = "twosided"
)

formula 
Object of class formula or character describing the model to fit. Interaction terms supported only for numerical variables. 
train_data 
Training data of class data.frame. 
test_data 
Test data of class data.frame. Utilizes ranger::predict() to produce prediction intervals for test data. 
method 
Choose what method to generate RF prediction intervals. Options are 
alpha 
Significance level for prediction intervals. Defaults to 
symmetry 
True if constructing symmetric outofbag prediction intervals, False otherwise. Used only 
seed 
Seed for random number generation. Currently not utilized. 
m_try 
Number of variables to randomly select from at each split. 
num_trees 
Number of trees used in the random forest. 
min_node_size 
Minimum number of observations before split at a node. 
num_threads 
The number of threads to use in parallel. Default is the current number of cores. 
calibrate 
If 
Roy_method 
Interval method for 
featureBias 
Remove feature bias. Only for 
predictionBias 
Remove prediction bias. Only for 
Tung_R 
Number of repetitions used in bias removal. Only for 
Tung_num_trees 
Number of trees used in bias removal. Only for 
variant 
Choose which variant to use. Options are 
Ghosal_num_stages 
Number of total stages. Only for 
prop 
Proportion of training data to sample for each tree. Only for 
concise 
If concise = TRUE, only predictions output. Defaults to 
interval_type 
Type of prediction interval to generate.
Options are 
The seven methods implemented are cited in the References section.
Additional information can be found within those references.
Each of these methods are implemented by utilizing the ranger package.
For method = "Zhang"
, prediction intervals are generated using outofbag residuals.
method = "Romano"
utilizes a splitconformal approach.
method = "Roy"
uses a bagofpredictors approach.
method = "Ghosal"
performs boosting to reduce bias in the random forest, and estimates variance.
The authors provide multiple variants to their methodology.
method = "Tung"
debiases feature selection and prediction. Prediction intervals are generated using quantile regression forests.
method = "HDI"
delivers prediction intervals through highestdensity interval regression forests.
method = "quantile"
utilizes quantile regression forests.

Default output. Includes prediction intervals for all methods in 

Predictions for test data for all methods in 
Chancellor Johnstone
Haozhe Zhang
breiman2001randompiRF
\insertRefghosal2018boostingpiRF
\insertRefmeinshausen2006quantilepiRF
\insertRefromano2019conformalizedpiRF
\insertRefroy2019predictionpiRF
\insertReftung2014biaspiRF
\insertRefzhang2019randompiRF
\insertRefzhu2019hdipiRF
ranger
rfinterval
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53  library(piRF)
#functions to get average length and average coverage of output
getPILength < function(x){
#average PI length across each set of predictions
l < x[,2]  x[,1]
avg_l < mean(l)
return(avg_l)
}
getCoverage < function(x, response){
#output coverage for test data
coverage < sum((response >= x[,1]) * (response <= x[,2]))/length(response)
return(coverage)
}
#import airfoil self noise dataset
data(airfoil)
method_vec < c("quantile", "Zhang", "Tung", "Romano", "Roy", "HDI", "Ghosal")
#generate train and test data
ratio < .975
nrow < nrow(airfoil)
n < floor(nrow*ratio)
samp < sample(1:nrow, size = n)
train < airfoil[samp,]
test < airfoil[samp,]
#generate prediction intervals
res < rfint(pressure ~ . , train_data = train, test_data = test,
method = method_vec,
concise= FALSE,
num_threads = 1)
#empirical coverage, and average prediction interval length for each method
coverage < sapply(res$int, FUN = getCoverage, response = test$pressure)
coverage
length < sapply(res$int, FUN = getPILength)
length
#get current mfrow setting
opar < par(mfrow = c(2,2))
#plotting intervals and predictions
for(i in 1:7){
col < ((test$pressure >= res$int[[i]][,1]) *
(test$pressure <= res$int[[i]][,2])1)*(1)+1
plot(x = res$preds[[i]], y = test$pressure, pch = 20,
col = "black", ylab = "true", xlab = "predicted", main = method_vec[i])
abline(a = 0, b = 1)
segments(x0 = res$int[[i]][,1], x1 = res$int[[i]][,2],
y1 = test$pressure, y0 = test$pressure, lwd = 1, col = col)
}
par(opar)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.