knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

efor

The goal of EasyFORcasting or short efor is to make it easier if you have to forecast multiple timeseries. The package supports you in creating forecasts with different methods implemented in the forecast, forecastHyrid, smooth and prophet package. Furthermore it provides some functions for evaluating these forecasts.

Following you see a possible workflow:

Installation

You can install the released version of efor from github with:

#devtools::install_github("flostracke/efor")

Example

Setup

First we load some packages for this example. The efor package contains some fictional timeseries data for demonstrating the apporaches:

library(dplyr)
library(tsibble) # for creating nicer representation of monthly data
library(efor)
library(furrr) # for running the forecasting in parallel
library(forecast) #provides forecast mehotds
library(ggplot2)


sales_data <- sales_monthly %>% 
  mutate(date = yearmonth(date))

sales_data

We have some sales data for four articles. We want to create forecasts for all these articles. The efor package makes this quite easy, because it provides functionality to create forecasts for multiple articles with just one function call

The idea is that all the data has to be organised in one dataframe with the following columns:

The dataframe sales_data is already meeting these requirements. You can verify the correct structure with the following function call, otherweise there would be an error:

check_input_data(sales_data)

Before we start forecasts let's quickly create a plot of the 4 different articles we want to forecast:

ggplot(sales_data, aes(x = date, y = y)) +
  geom_line() +
  geom_point() +
  facet_wrap(~iterate) +
  ggtitle("The original series") +
  theme_minimal() 

We split the dataset in a train and test set. All observations from the year 2016 go into the test set. We want to create forecasts for the next 4 months of the testset and evaluate the performance of different methods.

train_data <- sales_data %>% 
  filter(date < "2016-01-01")

test_data <- sales_data %>% 
  filter(date >= "2016-01-01")

Now we can apply the the auto.arima function to the dataset and create the forecasts. All the methods from the forecast package can be run in parallel.

Model Fitting

forecasts_ar <- tf_grouped_forecasts(
  train_data,        # used training dataset
  n_pred = 6,        # number of predictions
  func = auto.arima, # used forecasting method
  parallel = TRUE    # for runiing in parallel
)

forecasts_ar
forecasts_ets <- tf_grouped_forecasts(
  train_data,      # used training dataset
  n_pred = 6,      # number of predictions
  func = ets,  # used forecasting method
  parallel = FALSE #disabling parallel for prohet
)

forecasts_ets

Evaluation

In order to create some plots and evaluate the performance we combine the forecasts into one dataset.

forecasts <- bind_rows(forecasts_ar, forecasts_ets) %>% 
  mutate(date = yearmonth(date)) #reformat the date because of a bug in bind_rows

The package brings also a function which makes it quite easy to access the performance (right now thhe mae, rmse and rsquared is calculated) of all the forecasting methods in the passed prediction dataframe:

tf_calc_metrics(forecasts, test_data) 

Also it is possible to access the performance of each article:

tf_calc_metrics(forecasts, test_data, detailed = TRUE)

Finally we create a quick graph visualising the results of the forecasts.

train_data_plot <- train_data %>% 
  mutate(key = "train")

test_data_plot <- test_data %>% 
  mutate(key = "test")

bind_rows(train_data_plot, test_data_plot) %>% 
  bind_rows(forecasts) %>% 
  filter(key %in% c("auto.arima", "train", "test")) %>% 
  mutate(date = yearmonth(date)) %>% 
  ggplot(aes(x = date, y = y, color = key)) +
  geom_point() +
  geom_line() +
  facet_wrap(~iterate) +
  ggtitle("Forecasted values for each article") +
  ylab("Sales amount") +
  theme_minimal()


flostracke/efor documentation built on June 5, 2019, 5:36 p.m.