# bootCast: Forecasting Function for ARMA Models via Bootstrap In smoots: Nonparametric Estimation of the Trend and Its Derivatives in TS

## Description

Point forecasts and the respective forecasting intervals for autoregressive-moving-average (ARMA) models can be calculated, the latter via bootstrap, by means of this function.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16``` ```bootCast( X, p = NULL, q = NULL, include.mean = FALSE, n.start = 1000, h = 1, it = 10000, msg, pb = TRUE, cores = future::availableCores(), alpha = 0.95, export.error = FALSE, plot = FALSE, ... ) ```

## Arguments

 `X` a numeric vector that contains the time series that is assumed to follow an ARMA model ordered from past to present. `p` an integer value ≥ 0 that defines the AR order p of the underlying ARMA(p,q) model within `X`; is set to `NULL` by default; if no value is passed to `p` but one is passed to `q`, `p` is set to `0`; if both `p` and `q` are `NULL`, optimal orders following the BIC for 0 ≤ p,q ≤ 5 are chosen; is set to `NULL` by default; decimal numbers will be rounded off to integers. `q` an integer value ≥ 0 that defines the MA order q of the underlying ARMA(p,q) model within `X`; is set to `NULL` by default; if no value is passed to `q` but one is passed to `p`, `q` is set to `0`; if both `p` and `q` are `NULL`, optimal orders following the BIC for 0 ≤ p,q ≤ 5 are chosen; is set to `NULL` by default; decimal numbers will be rounded off to integers. `include.mean` a logical value; if set to `TRUE`, the mean of the series is also also estimated; if set to `FALSE`, E(X_t) = 0 is assumed; is set to `FALSE` by default. `n.start` an integer that defines the 'burn-in' number of observations for the simulated ARMA series via bootstrap; is set to `1000` by default; decimal numbers will be rounded off to integers. `h` an integer that represents the forecasting horizon; if n is the number of observations, point forecasts and forecasting intervals will be obtained for the time points n + 1 to n + h; is set to `h = 1` by default; decimal numbers will be rounded off to integers. `it` an integer that represents the total number of iterations, i.e., the number of simulated series; is set to `10000` by default; decimal numbers will be rounded off to integers. `msg` this argument is deprecated; make use of the argument `pb` instead; for `msg = NA`, `pb = TRUE` will be implemented, while any one-element numeric vector will lead to `pb = TRUE`. `pb` a logical value; for `pb = TRUE`, a progress bar will be shown in the console. `cores` an integer value >0 that states the number of (logical) cores to use in the bootstrap (or `NULL`); the default is the maximum number of available cores (via `future::availableCores`); for `cores = NULL`, parallel computation is disabled. `alpha` a numeric vector of length 1 with 0 < `alpha` < 1; the forecasting intervals will be obtained based on the confidence level (100`alpha`)-percent; is set to `alpha = 0.95` by default, i.e., a 95-percent confidence level. `export.error` a single logical value; if the argument is set to `TRUE`, a list is returned instead of a matrix (`FALSE`); the first element of the list is the usual forecasting matrix, whereas the second element is a matrix with `h` columns, where each column represents the calculated forecasting errors for the respective future time point n + 1, n + 2, ..., n + h; is set to `FALSE` by default. `plot` a logical value that controls the graphical output; for `plot = TRUE`, the original series with the obtained point forecasts as well as the forecasting intervals will be plotted; for the default `plot = FALSE`, no plot will be created. `...` additional arguments for the standard plot function, e.g., `xlim`, `type`, ... ; arguments with respect to plotted graphs, e.g., the argument `col`, only affect the original series `X`; please note that in accordance with the argument `x` (lower case) of the standard plot function, an additional numeric vector with time points can be implemented via the argument `x` (lower case). `x` should be valid for the sample observations only, i.e. `length(x) == length(X)` should be `TRUE`, as future time points will be calculated automatically.

## Details

This function is part of the `smoots` package and was implemented under version 1.1.0. For a given time series X_[t], t = 1, 2, ..., n, the point forecasts and the respective forecasting intervals will be calculated. It is assumed that the series follows an ARMA(p,q) model

X_[t] - μ = ε_[t] + β_[1] (X_[t-1] - μ) + ... + β_[p] (X_[t-p] - μ) + α_[1] ε_[t-1] + ... + α_[q] ε_[t-q],

where α_[j] and β_[i] are real numbers (for i = 1, 2, .., p and j = 1, 2, ..., q) and ε_[t] are i.i.d. (identically and independently distributed) random variables with zero mean and constant variance. μ is equal to E(X_[t]).

The point forecasts and forecasting intervals for the future periods n + 1, n + 2, ..., n + h will be obtained. With respect to the point forecasts hat[X]_[n+k], where k = 1, 2, ..., h,

hat[X]_[n+k] = hat[μ] + sum_[i=1]^[p] {hat[β]_[i](X_[n+k-i] - hat[μ])} + sum_[j=1]^[q] {hat[α]_[j]hat[ε]_[n+k-j]}

with X_[n+k-i] = hat[X]_[n+k-i] for n+k-i > n and hat[ε]_[n+k-j] = E(ε_[t]) = 0 for n+k-j > n will be applied.

The forecasting intervals on the other hand are obtained by a forward bootstrap method that was introduced by Pan and Politis (2016) for autoregressive models and extended by Lu and Wang (2020) for applications to autoregressive-moving-average models. For this purpose, let l be the number of the current bootstrap iteration. Based on the demeaned residuals of the initial ARMA estimation, different innovation series ε^[s]_[l,t] will be sampled. The initial coefficient estimates and the sampled innovation series are then used to simulate a variety of series X^[s]_[l,t], from which again coefficient estimates will be obtained. With these newly obtained estimates, proxy residual series hat[ε]^[s]_[l,t] are calculated for the original series X_[t]. Subsequently, point forecasts for the time points n + 1 to n + h are obtained for each iteration l based on the original series X_[t], the newly obtained coefficient forecasts and the proxy residual series hat[ε]^[s]_[l,t]. Simultaneously, "true" forecasts, i.e., true future observations, are simulated. Within each iteration, the difference between the simulated true forecast and the bootstrapped point forecast is calculated and saved for each future time point n + 1 to n + h. The result for these time points are simulated empirical values of the forecasting error. Denote by q_[k](.) the quantile of the empirical distribution for the future time point n + k. Given a predefined confidence level `alpha`, define α_s = (1 - `alpha`)/2. The bootstrapped forecasting interval is then

[hat[X]_[n+k] + q_[k](alpha_[s]), hat[X]_[n+k] + q_[k](1 - alpha_[s])],

i.e., the forecasting intervals are given by the sum of the respective point forecasts and quantiles of the respective bootstrapped forecasting error distributions.

The function `bootCast` allows for different adjustments to the forecasting progress. At first, a vector with the values of the observed time series ordered from past to present has to be passed to the argument `X`. Orders p and q of the underlying ARMA process can be defined via the arguments `p` and `q`. If only one of these orders is inserted by the user, the other order is automatically set to `0`. If none of these arguments are defined, the function will choose orders based on the Bayesian Information Criterion (BIC) for 0 ≤ p,q ≤ 5. Via the logical argument `include.mean` the user can decide, whether to consider the mean of the series within the estimation process. By means of `n.start`, the number of "burn-in" observations for the simulated ARMA processes can be regulated. These observations are usually used for the processes to build up and then omitted. Furthermore, the argument `h` allows for the definition of the maximum future time point n + h. Point forecasts and forecasting intervals will be returned for the time points n + 1 to n + h. `it` corresponds to the number of bootstrap iterations. We recommend a sufficiently high number of repetitions for maximum accuracy of the results. Another argument is `alpha`, which is the equivalent of the confidence level considered within the calculation of the forecasting intervals, i.e., the quantiles (1 - `alpha`)/2 and 1 - (1 - `alpha`)/2 of the bootstrapped forecasting error distribution will be obtained.

Since this bootstrap approach needs a lot of computation time, especially for series with high numbers of observations and when fitting models with many parameters, parallel computation of the bootstrap iterations is enabled. With `cores`, the number of cores can be defined with an integer. Nonetheless, for `cores = NULL`, no cluster is created and therefore the parallel computation is disabled. Note that the bootstrapped results are fully reproducible for all cluster sizes. The progress of the bootstrap can be observed in the R console, where a progress bar and the estimated remaining time are displayed for `pb = TRUE`.

If the argument `export.error` is set to `TRUE`, the output of the function is a list instead of a matrix with additional information on the simulated forecasting errors. For more information see the section Value.

For simplicity, the function also incorporates the possibility to directly create a plot of the output, if the argument `plot` is set to `TRUE`. By the additional and optional arguments `...`, further arguments of the standard plot function can be implemented to shape the returned plot.

NOTE:

Within this function, the `arima` function of the `stats` package with its method `"CSS-ML"` is used throughout for the estimation of ARMA models. Furthermore, to increase the performance, C++ code via the `Rcpp` and `RcppArmadillo` packages was implemented. Also, the `future` and `future.apply` packages are considered for parallel computation of bootstrap iterations. The progress of the bootstrap is shown via the `progressr` package.

## Value

The function returns a 3 by h matrix with its columns representing the future time points and the point forecasts, the lower bounds of the forecasting intervals and the upper bounds of the forecasting intervals as the rows. If the argument `plot` is set to `TRUE`, a plot of the forecasting results is created.

If `export.error = TRUE` is selected, a list with the following elements is returned instead.

fcast

the 3 by h matrix forecasting matrix with point forecasts and bounds of the forecasting intervals.

error

a `it` by h matrix, where each column represents a future time point n + 1, n + 2, ..., n + h; in each column the respective `it` simulated forecasting errors are saved.

## Author(s)

• Dominik Schulz (Research Assistant) (Department of Economics, Paderborn University),
Package Creator and Maintainer

## References

Feng, Y., Gries, T. and Fritz, M. (2020). Data-driven local polynomial for the trend and its derivatives in economic time series. Journal of Nonparametric Statistics, 32:2, 510-533.

Feng, Y., Gries, T., Letmathe, S. and Schulz, D. (2019). The smoots package in R for semiparametric modeling of trend stationary time series. Discussion Paper. Paderborn University. Unpublished.

Feng, Y., Gries, T., Fritz, M., Letmathe, S. and Schulz, D. (2020). Diagnosing the trend and bootstrapping the forecasting intervals using a semiparametric ARMA. Discussion Paper. Paderborn University. Unpublished.

Lu, X., and Wang, L. (2020). Bootstrap prediction interval for ARMA models with unknown orders. REVSTAT–Statistical Journal, 18:3, 375-396.

Pan, L. and Politis, D. N. (2016). Bootstrap prediction intervals for linear, nonlinear and nonparametric autoregressions. In: Journal of Statistical Planning and Inference 177, pp. 1-27.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32``` ```### Example 1: Simulated ARMA process ### # Function for drawing from a demeaned chi-squared distribution rchisq0 <- function(n, df, npc = 0) { rchisq(n, df, npc) - df } # Simulation of the underlying process n <- 2000 n.start = 1000 set.seed(23) X <- arima.sim(model = list(ar = c(1.2, -0.7), ma = 0.63), n = n, rand.gen = rchisq0, n.start = n.start, df = 3) + 13.1 # Quick application with low number of iterations # (not recommended in practice) result <- bootCast(X = X, p = 2, q = 1, include.mean = TRUE, n.start = n.start, h = 5, it = 10, cores = 2, plot = TRUE, lty = 3, col = "forestgreen", xlim = c(1950, 2005), type = "b", main = "Exemplary title", pch = "*") result ### Example 2: Application with more iterations ### ## Not run: result2 <- bootCast(X = X, p = 2, q = 1, include.mean = TRUE, n.start = n.start, h = 5, it = 10000, cores = 2, plot = TRUE, lty = 3, col = "forestgreen", xlim = c(1950, 2005), main = "Exemplary title") result2 ## End(Not run) ```

smoots documentation built on Oct. 10, 2021, 1:09 a.m.