knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
library(badger)
cat( "[](https://cran.r-project.org/package=piRF)", "[](https://cran.r-project.org/package=piRF)" #badge_cran_release("piRF", "orange"), #badge_cran_download("piRF", "grand-total", "blue") )
The goal of piRF is to implement multiple state-of-the art random forest prediction interval methodologies in one complete package. Currently, the methods implemented can only be utilized within isolated packages, or the authors have not made a package publicly available. The package itself utilizes the functionality provided by the ranger package. If you utilize this package in any publications, please use the following citation:
Johnstone C, Zhang H (2020). piRF: Prediction Intervals for Random Forests. R package version 0.1.0, https://CRAN.R-project.org/package=piRF.
A BibTeX entry for LaTeX users is
@Manual{, title = {piRF: Prediction Intervals for Random Forests}, author = {Chancellor Johnstone and Haozhe Zhang}, year = {2020}, note = {R package version 0.1.0}, url = {https://CRAN.R-project.org/package=piRF}, }
You can install the released version of piRF from CRAN with:
install.packages("piRF")
And the development version from GitHub with:
# install.packages("devtools") devtools::install_github("chancejohnstone/piRF")
This is a basic example which utilizes the airfoil dataset included with piRF. The dataset comes from UCI Archive. The NASA data set comprises different size NACA 0012 airfoils at various wind tunnel speeds and angles of attack.
The following functions are not exported by piRF but are used for this example.
library(piRF) ## basic example code data(airfoil) head(airfoil) #functions to get average length and average coverage of output getPILength <- function(x){ #average PI length across each set of predictions l <- x[,2] - x[,1] avg_l <- mean(l) return(avg_l) } getCoverage <- function(x, response){ #output coverage for test data coverage <- sum((response >= x[,1]) * (response <= x[,2]))/length(response) return(coverage) }
Prediction intervals are generated for each of the methods implemented using train and test data sets constructed from the airfoil data.
method_vec <- c("quantile", "oob", "bcqrf", "cqrf", "bop", "hdi", "brf") #generate train and test data set.seed(2020) ratio <- .975 nrow <- nrow(airfoil) n <- floor(nrow*ratio) samp <- sample(1:nrow, size = n) train <- airfoil[samp,] test <- airfoil[-samp,] #generate fitted models res <- pirf(pressure ~ . , train_data = train, method = method_vec, concise= FALSE, num_threads = 2, alpha = .1) #generate prediction intervals from fitted models preds <- predict(res, test_data = test)
In this example, the num_threads option identifies the use of two cores for parallel processing. The default is to use all available cores. The concise option allows for the output of predictions for the test observations.
Below are the coverage rates and average prediction interval lengths using the test dataset. Both are important characteristics of prediction intervals.
#empirical coverage, and average prediction interval length for each method coverage <- sapply(preds$int, FUN = getCoverage, response = test$pressure) coverage length <- sapply(preds$int, FUN = getPILength) length
Below are plots of the resulting prediction intervals generated for each method.
#plotting intervals and predictions par(mfrow = c(2,2)) for(i in 1:length(method_vec)){ #color based on empirical coverage col <- ((test$pressure >= preds$int[[i]][,1]) * (test$pressure <= preds$int[[i]][,2])-1)*(-1)+1 plot(x = preds$preds[[i]], y = test$pressure, pch = 20, col = "black", ylab = "true", xlab = "predicted", main = method_vec[i]) abline(a = 0, b = 1) segments(x0 = preds$int[[i]][,1], x1 = preds$int[[i]][,2], y1 = test$pressure, y0 = test$pressure, lwd = 1, col = col) }
If you find any issues with the package, or have suggestions for improvements, please let us know.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.