A number of definitions are used, which are visualized and described below:
knitr::include_graphics(file.path(dirname(dirname(getwd())),"initialization/Rmarkdown/figures/forecast_error.png"))
knitr::include_graphics(file.path(dirname(dirname(getwd())),"initialization/Rmarkdown/figures/sliding_windows.png"))
For every available group in the data, there are multiple forecasting models that can be trained. The full list of forecasting models that are available is included in this document under Forecasting models. The way each of these forecast models has been trained is described under one of the previous tabs called Definitions.
This section describes which process can be followed to select a specific 'best' model for each group, to be used as final forecast model. For more information on the definitions of specific terms, have a look at one of the previous tabs called Definitions.
The process is as follows:
This can result in the following number of forecast errors:
number of forecast errors = 30 forecast models x 50 sliding windows x 60 forecast horizons = +/- 90.000 data points
This collection of data points is then summarized to evaluate the forecast performance:
This information is then used to determine the order of the different forecast models for each forecast horizon, based on the mean absolute forecast error as a performance metric. This results in a separate ranking of the n different forecast models for each available forecast horizon (1 to 60 months ahead).
For example, for the 12 months ahead forecast horizon the ranking could be like this:
By calculating a ranking for every forecast horizon we get an indication of the performance of every forecast model over all of these different forecast horizons. For every forecast model, the ranking of that model in terms of forecast performance over these different forecast horizons is summed to obtain an overall ranking.
For example, the overall ranking for the fc_ets_addiv model could be 89 points, because it was one of the top 5 performers in most of the forecast horizons that have been considered.
The forecast models are then ranked based on their overall ranking (where a lower overall rank indicates a better forecast performance then a higher overall rank) and a top X of forecast models can be selected.
The forecast models are written in the programming language R, for which a time series forecasting framework has been developed using a set of publicly available packages as well as user defined functions.
A limited overview of the most important software tools used to build the forecast models is given below:
| Tool | Description |
|:-----------:|-------------------------------------------|
|" width="125" height="125" />| R is an open source language and environment for statistical computing and graphics. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, etc.) and graphical techniques, and is highly extensible. For more information, check out the R project page. |
|
" width="125" height="125" />| GitLab is a web-based Git repository manager with wiki and issue tracking features, using an open source license, developed by GitLab Inc. Git is a system where you can create projects of different sizes with speed and efficiency. It helps you manage code, communicate and collaborate on different software projects. Git will allow you to go back to a previous status on a project or to see its entire evolution since the project was created. For more information, check out this tutorial on Git or this blog post on GitLab. |
A limited overview of the most frequently used R packages in the time series forecasting framework is given below:
| R Package | Description |
|:-----------:|-------------------------------------------|
|" width="75" height="75" />| forecast provides methods and tools for displaying and analysing univariate time series forecasts including exponential smoothing via state space models and automatic ARIMA modelling. |
|
" width="75" height="75" />| prophet provides a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly and weekly seasonality, plus holidays. Prophet is open source software released by Facebook's Core Data Science team. |
|
" width="75" height="75" />| ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics. You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details. |
|
" width="75" height="75" />| plotly is an online data analytics and visualization tool to create interactive, D3 and WebGL charts in R. With one line of code, it converts ggplot2 graphs to an interactive, Web embeddable version. |
|
" width="75" height="75" />| dplyr provides a grammar of data manipulation, providing a consistent set of verbs that solve the most common data manipulation challenges. |
|
" width="75" height="75" />| tidyr provides a set of functions that help you get to tidy data. Tidy data is data with a consistent form: in brief, every variable goes in a column, and every column is a variable. |
|
" width="75" height="75" />| purrr enhances R's functional programming (FP) toolkit by providing a complete and consistent set of tools for working with functions and vectors. Once you master the basic concepts, purrr allows you to replace many for loops with code that is easier to write and more expressive. |
|
" width="75" height="75" />| tibble is a modern re-imagining of the data frame, keeping what time has proven to be effective, and throwing out what it has not. Tibbles are data.frames that are lazy and surly: they do less and complain more forcing you to confront problems earlier, typically leading to cleaner, more expressive code. |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.