Speed

The time it takes to compute the intervals depends on the training set size, search parameters (i.e., convergence criterion, number of iterations), the grid size, and the number of worker processes that are used. For the last item, the computations can be parallelized using the future and furrr packages.

To use parallelism, the [future::plan()] function can be invoked to create a parallel backend. For example, let's make an initial workflow:

library(tidymodels)
library(probably)
tidymodels_prefer()
library(tidymodels)
library(probably)
library(future)

tidymodels_prefer()

## Make a fitted workflow from some simulated data:
set.seed(121)
train_dat <- sim_regression(200)
new_dat   <- sim_regression(  5) %>% select(-outcome)

lm_fit <- 
  workflow() %>% 
  add_model(linear_reg()) %>% 
  add_formula(outcome ~ .) %>% 
  fit(data = train_dat)

# Create the object to be used to make prediction intervals
lm_conform <- int_conformal_full(lm_fit, train_dat)

We'll use a "multisession" parallel processing plan to compute the intervals for the five new samples in parallel:

plan("multisession")

# This is run in parallel:
predict(lm_conform, new_dat)

Using simulations, there are slightly sub-linear speed-ups when using parallel processing to compute the row-wise intervals.

In comparison with parametric intervals:

predict(lm_fit, new_dat, type = "pred_int")


topepo/probably documentation built on April 6, 2024, 7:32 p.m.