cb_cvforecast: Conformal bootstrap prediction intervals through time series...
In smimodel: Sparse Multiple Index Models for Nonparametric Forecasting

cb_cvforecast

R Documentation

Conformal bootstrap prediction intervals through time series cross-validation forecasting

Description

Compute prediction intervals by applying the conformal bootstrap method to subsets of time series data using a rolling forecast origin.

Usage

cb_cvforecast(
  object,
  data,
  yvar,
  neighbour = 0,
  predictor.vars,
  h = 1,
  ncal = 100,
  num.futures = 1000,
  level = c(80, 95),
  forward = TRUE,
  initial = 1,
  window = NULL,
  roll.length = 1,
  exclude.trunc = NULL,
  recursive = FALSE,
  recursive_colNames = NULL,
  na.rm = TRUE,
  nacheck_frac_numerator = 2,
  nacheck_frac_denominator = 3,
  verbose = list(solver = FALSE, progress = FALSE),
  ...
)

Arguments

`object`	Fitted model object of class `smimodel`, `backward`, `gaimFit` or `pprFit`.
`data`	Data set. Must be a data set of class `tsibble`.(Make sure there are no additional date or time related variables except for the `index` of the `tsibble`). If multiple models are fitted, the grouping variable should be the `key` of the `tsibble`. If a `key` is not specified, a dummy key with only one level will be created.
`yvar`	Name of the response variable as a character string.
`neighbour`	If multiple models are fitted: Number of neighbours of each key (i.e. grouping variable) to be considered in model fitting to handle smoothing over the key. Should be an `integer`. If `neighbour = x`, `x` number of keys before the key of interest and `x` number of keys after the key of interest are grouped together for model fitting. The default is `neighbour = 0` (i.e. no neighbours are considered for model fitting).
`predictor.vars`	A character vector of names of the predictor variables.
`h`	Forecast horizon.
`ncal`	Length of a calibration window.
`num.futures`	Number of possible future sample paths to be generated in bootstrap.
`level`	Confidence level for prediction intervals.
`forward`	If `TRUE`, the final forecast origin for forecasting is `y_T`. Otherwise, the final forecast origin is `y_{T-1}`.
`initial`	Initial period of the time series where no cross-validation forecasting is performed.
`window`	Length of the rolling window. If `NULL`, a rolling window will not be used.
`roll.length`	Number of observations by which each rolling/expanding window should be rolled forward.
`exclude.trunc`	The names of the predictor variables that should not be truncated for stable predictions as a character string. (Since the nonlinear functions are estimated using splines, extrapolation is not desirable. Hence, if any predictor variable is treated non-linearly in the estimated model, will be truncated to be in the in-sample range before obtaining predictions. If any variables are listed here will be excluded from such truncation.)
`recursive`	Whether to obtain recursive forecasts or not (default - `FALSE`).
`recursive_colNames`	If `recursive = TRUE`, a character vector giving the names of the columns in test data to be filled with forecasts. Recursive/autoregressive forecasting is required when the lags of the response variable itself are used as predictor variables into the model. Make sure such lagged variables are positioned together in increasing lag order (i.e. `lag_1, lag_2, ..., lag_m`, `lag_m =` maximum lag used) in `data`, with no break in the lagged variable sequence even if some of the intermediate lags are not used as predictors.
`na.rm`	logical; if `TRUE` (default), any `NA` and `NaN`'s are removed from the sample before the quantiles are computed.
`nacheck_frac_numerator`	Numerator of the fraction of non-missing values that is required in a test set.
`nacheck_frac_denominator`	Denominator of the fraction of non-missing values that is required in a test set.
`verbose`	A named list controlling verbosity options. Defaults to `list(solver = FALSE, progress = FALSE)`. solver Logical. If TRUE, prints detailed solver output when the SMI model is used. progress Logical. If TRUE, prints cross-validation progress messages (all models) and optimisation algorithm progress messages (SMI model only).
`...`	Other arguments not currently used.

Value

An object of class cb_cvforecast, which is a list that contains following elements:

`x`	The original time series.
`method`	A character string "cb_cvforecast".
`fit_times`	The number of times the model is fitted in cross-validation.
`mean`	Point forecasts as a multivariate time series, where the `h^{th}` column holds the point forecasts for forecast horizon `h`. The time index corresponds to the period for which the forecast is produced.
`error`	Forecast errors given by `e_{t+h\|t} = y_{t+h} - \hat{y}_{t+h\|t}`.
`res`	The matrix of in-sample residuals produced in cross-validation.
`level`	The confidence levels associated with the prediction intervals.
`cal_times`	The number of calibration windows considered in cross-validation.
`num_cal`	The number of non-missing multi-step forecast errors in each calibration window.
`skip_cal`	An indicator vector indicating whether a calibration window is skipped without constructing prediction intervals due to missing model or missing data in the test set.
`lower`	A list containing lower bounds for prediction intervals for each level. Each element within the list will be a multivariate time series with the same dimensional characteristics as `mean`.
`upper`	A list containing upper bounds for prediction intervals for each level. Each element within the list will be a multivariate time series with the same dimensional characteristics as `mean`.
`possible_futures`	A list of matrices containing future sample paths generated at each calibration step.

Examples


if(requireNamespace("gurobi", quietly = TRUE)){
  library(dplyr)
  library(ROI)
  library(tibble)
  library(tidyr)
  library(tsibble)

  # Simulate data
  n = 1105
  set.seed(123)
  sim_data <- tibble(x_lag_000 = runif(n)) |>
    mutate(
      # Add x_lags
      x_lag = lag_matrix(x_lag_000, 5)) |>
    unpack(x_lag, names_sep = "_") |>
    mutate(
      # Response variable
      y = (0.9*x_lag_000 + 0.6*x_lag_001 + 0.45*x_lag_003)^3 +
      (0.35*x_lag_002 + 0.7*x_lag_005)^2 + rnorm(n, sd = 0.1),
      # Add an index to the data set
      inddd = seq(1, n)) |>
    drop_na() |>
    select(inddd, y, starts_with("x_lag")) |>
    # Make the data set a `tsibble`
    as_tsibble(index = inddd)

  # Index variables
  index.vars <- colnames(sim_data)[3:8]

  # Training set
  sim_train <- sim_data[1:1000, ]
  # Test set
  sim_test <- sim_data[1001:1100, ]

  # Model fitting
  smimodel_ppr <- model_smimodel(data = sim_train,
                                yvar = "y",
                                index.vars = index.vars,
                                initialise = "ppr")

  # Conformal bootstrap prediction intervals (3-steps-ahead interval forecasts)
  set.seed(12345)
  smimodel_ppr_cb <- cb_cvforecast(object = smimodel_ppr,
                                  data = sim_data,
                                  yvar = "y",
                                  predictor.vars = index.vars,
                                  h = 3,
                                  ncal = 30,
                                  num.futures = 100,
                                  window = 1000)
 }

smimodel documentation built on April 8, 2026, 5:06 p.m.