vctsfr
In vctsfr: Visualizing Collections of Time Series Forecasts

knitr::opts_chunk$set(
  fig.width = 6,
  fig.heigth = 2,
  collapse = TRUE,
  comment = "#>"
)

In this document we present the vctsfr package, which can be an useful tool for those involved in time series forecasting because it allows you to visually compare the predictions of several forecasting models. The vctsfr package is especially convenient when you want to visually compare the predictions of several forecasting methods across a collection of time series.

The vctsfr package makes it easy the visualization of collections of time series and, optionally, their future values and forecasts for those future values. The forecasts can include prediction intervals. This package is particularly useful when you have forecasts (maybe from different models) for several time series and you want to display them in order to compare their results.

This package arises from a need of her authors. Frequently, we used several forecasting methods to predict the future values of collections of time series (typically belonging to time series competitions). The usual way of comparing the performance of the forecasting methods is to compute a global measure of forecast accuracy for every method based on all its forecasts for all the series of the competition. However, we miss a way of visually compare the performance of different methods over a particular series. This package fills this gap.

This package also facilitates the visualization of just a collection of time series (without forecasts).

Visualizing a single time series

If you only want to display a single time series and, optionally, information about its future values and/or a forecast for its future values you can use the plot_ts() function. Let us see how it works.

library(vctsfr)
plot_ts(USAccDeaths) # plotting a time series

By default, plot_ts() shows the data points in the time series. However, you can omit them with the sdp parameter:

plot_ts(USAccDeaths, sdp = FALSE)

Let us now display the same time series and a forecast for its next 12 months using the exponential smoothing model implemented in the forecast package (ets() function):

library(forecast)
ets_fit <- ets(USAccDeaths)
ets_f <- forecast(ets_fit, h = 12)
plot_ts(USAccDeaths, prediction = ets_f$mean, method = "ets")

If the forecasting method computes prediction intervals, they can be displayed. For example, let's add a 90% prediction interval to the previous forecast:

library(forecast)
ets_fit <- ets(USAccDeaths)
ets_f <- forecast(ets_fit, h = 12, level = 90)
plot_ts(USAccDeaths, 
        prediction = ets_f$mean, 
        method = "ets", 
        lpi = ets_f$lower, 
        upi = ets_f$upper, 
        level = 90
)

Finally, the actual values that are predicted can also be displayed:

timeS <- window(USAccDeaths, end = c(1977, 12))
fut <- window(USAccDeaths, start = c(1978, 1))
ets_fit <- ets(timeS)
ets_f <- forecast(ets_fit, h = length(fut), level = 80)
plot_ts(timeS, 
        future = fut, 
        prediction = ets_f$mean, 
        method = "ets",
        lpi = ets_f$lower, 
        upi = ets_f$upper, 
        level = 80
)

Summarizing, the plot_ts() function is useful to visualize a time series and, optionally, a forecast for its future values.

In all the functions of the vctsfr package the time series parameter, i.e. the historical values of the series, is specified as an object of class ts. The future values, forecasts and prediction intervals can be specified as a numeric vector or as an object of class ts.

Visualizing several forecasts

When you want to compare several forecasts for the future values of a time series you can use the function plot_predictions(). The forecasts are passed to the function as a list, each component of the list is a forecast and the name of the component is the name of the forecasting method. Let us see an example in which, given a time series, the forecasts for its future values, using the ARIMA and exponential smoothing models implemented in the forecast package, are displayed:

timeS <- window(USAccDeaths, end = c(1977, 12)) # historical values
fut <- window(USAccDeaths, start = c(1978, 1))  # "future" values
ets_fit <- ets(timeS)                           # exponential smoothing fit
ets_f <- forecast(ets_fit, h = length(fut))     # exponential smoothing forecast
arima_fit <- auto.arima(timeS)                  # ARIMA fit
arima_f <- forecast(arima_fit, h = length(fut)) # ARIMA forecast
plot_predictions(timeS, future = fut, 
                 predictions = list(ets = ets_f$mean, arima = arima_f$mean)
)

Looking at the plot, it is clear that both models produce similar and fairly accurate predictions.

Creating collections of time series

In the previous sections we have seen how to display a time series and forecasts for its future values. However, the main goal of the vctsfr package is to facilitate the visualization of collections of time series so that you can visually compare forecasts for their future values.

In this section we study how to build these collections. For this purpose, the vctsfr package provides three functions:

ts_info() allows you to create an object with information about a time series.
prediction_info() allows you to create an object with information about a forecast.
pi_info() allows you to create an object with information about the prediction interval associated with a forecast.

A collection of time series is a list of objects created with (returned by) the ts_info() function. Let's first create a collection storing the historical values of two time series:

collection1 <- list(ts_info(USAccDeaths), ts_info(UKDriverDeaths))

In the next section we will use the collections built in this section to visualize their information. Next, we use a dataset included in the Mcomp package to create another collection of series. The Mcomp package contains datasets of forecasting competitions. We are going to create a collection of 18 quarterly time series, with their associated next 12 future values and forecasts for their future values (using the previously applied ets() function of the forecast package).

library(Mcomp)

# select the industry, quarterly series from M1 competition (18 series)
M1_quarterly <- subset(M1, 4, "industry")

# build the collection
collection2 <- vector("list", length = length(M1_quarterly))
for (ind in seq_along(M1_quarterly)) {
  timeS   <- M1_quarterly[[ind]]$x              # time series
  name    <- M1_quarterly[[ind]]$st             # time series's name
  fut     <- M1_quarterly[[ind]]$xx             # future values
  ets_fit <- ets(timeS)                         # ES fit
  ets_for <- forecast(ets_fit, h = length(fut)) # ES forecast
  collection2[[ind]] <- ts_info(timeS,
                                prediction_info("ets", ets_for$mean),
                                future = fut,
                                name = name
  )
}

Finally, we use the same dataset of 18 time series to create a collection of time series, with two forecasts (obtained with two different forecasting models) for each series and some prediction intervals associated with each forecast.

collection3 <- vector("list", length = length(M1_quarterly))
for (ind in seq_along(M1_quarterly)) {
  t <- M1_quarterly[[ind]]$x                            # time series
  name <- M1_quarterly[[ind]]$st                        # time series's name
  f <- M1_quarterly[[ind]]$xx                           # "future" values
  ets_fit <- ets(t)                                     # ES fit
  ets_f <- forecast(ets_fit, h = length(f), level = 90) # ES forecast
  arima_fit <- auto.arima(t)                            # ARIMA fit
  arima_f <- forecast(arima_fit, h = length(f),         # ARIMA forecast
                      level = c(80, 90)
  )
  collection3[[ind]] <- ts_info(t,
                               future = f,
                               prediction_info("ets", 
                                               ets_f$mean, 
                                               pi_info(90, 
                                                       ets_f$lower, 
                                                       ets_f$upper)
                               ),
                               prediction_info("arima", 
                                               arima_f$mean,
                                               pi_info(80, 
                                                       arima_f$lower[, 1], 
                                                       arima_f$upper[, 1]
                                               ),
                                               pi_info(90, 
                                                       arima_f$lower[, 2], 
                                                       arima_f$upper[, 2]
                                               )
                               ),
                               name = name)
}

It can be noted that the exponential smoothing forecasts have a prediction interval, while the ARIMA forecasts have two prediction intervals. Although it does not happen in these examples, different time series in a collection can include forecasts done with different models.

Visualizing a time series from a collection

Once you have created a collection, you can see the information about one of its series with the plot_collection() function. The basic way of using this function is specifying a collection and the number (index) of the series in the collection. Let's see an example using a collection built in the previous section:

plot_collection(collection3, number = 3)

plot_collection() has displayed all available information about the series with index 3, except for the prediction intervals. You can choose to display a subset of the forecasts associated with a time series providing a vector with the names of the forecasting methods you want to select:

plot_collection(collection3, number = 3, methods = "ets")

Finally, if you display the forecasts of just one forecasting method and this method has prediction intervals, you can display one of its prediction intervals providing its level:

plot_collection(collection3, number = 3, methods = "arima", level = 90)

Looking at the plot, all predicted future values fall within the prediction interval.

The web-based GUI for visualizing collections of time series

Although the plot_collection() function is handy, the best way of navigating through the different series of a collection is to use the GUI_collection() function, which launches a Shiny GUI in a web browser.

GUI_collection(collection3)

This is how the GUI looks like:

The vctsfr GUI {width="90%"}

Using the GUI the user can select:

Which time series to display.
If the data points are highlighted.
Which forecasting methods to display.
In the case that only one forecast is displayed and this forecast has prediction intervals, which prediction interval to show.

Apart from the time series, the GUI shows information about the forecast accuracy of the displayed forecasting methods. Currently, for each forecasting method the following forecasting accuracy measures are computed:

RMSE: root mean squared error
MAPE: mean absolute percentage error
MAE: mean absolute error
ME: mean error
MPE: mean percentage error.
sMAPE: symmetric MAPE
MASE: mean absolute scaled error

This way, you can compare the displayed forecasting methods both visually and through forecast accuracy measures.

Next, we describe how the forecasting accuracy measures are computed for a forecasting horizon $h$ ($y_t$ and $\hat{y}_t$ are the actual future value and its forecast for horizon $t$ respectively):

$$ RMSE = \sqrt{\frac{1}{h}\sum_{t=1}^{h} (y_t-\hat{y}t)^2} $$ $$ MAPE = \frac{1}{h}\sum{t=1}^{h} 100\frac{|y_t-\hat{y}t|}{y_t} $$ $$ MAE = \frac{1}{h}\sum{t=1}^{h} |y_t-\hat{y}t| $$ $$ ME = \frac{1}{h}\sum{t=1}^{h} y_t-\hat{y}t $$ $$ MPE = \frac{1}{h}\sum{t=1}^{h} 100\frac{y_t-\hat{y}t}{y_t} $$ $$ sMAPE = \frac{1}{h}\sum{t=1}^{h} 200\frac{\left|y_{t}-\hat{y}{t}\right|}{|y_t|+|\hat{y}_t|} $$ $$ MASE = \frac{\frac{1}{h}\sum{t=1}^{h} |y_t - \hat{y}t|}{\frac{1}{T-f}\sum{t=f+1}^{T} |h_t - h_{t-f}|} $$

In the MASE computation $T$ is the length of the training set (i.e., the length of the time series with historical values), $h_t$ is the t-th historical value and $f$ is the frequency of the time series (1 for annual data, 4 for quarterly data, 12 for monthly data, ...).