How to Run a Synthetic Forecast

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This is a walk through of how the package is intended to be used with a practical example.

The Dataset

The first thing that a forecast needs a data to be forecasted. The SynthCast provides a example of how it expected a dataset to look like, the code bellow loads the package and the example dataset:

library(knitr)
library(SynthCast)
data('df_example')
kable(head(df_example)) 

The dataset is expected to have 3 types of columns:

The table bellow shows the max time for each unit:

library(dplyr)

df_example %>%
  group_by(unit) %>%
  summarise(max_time_period=max(time_period)) %>%
  filter(unit %in% c(1, 2, 3, 4, 5, 45, 46, 47, 48, 49, 50)) %>% 
  kable()

As one can see the older unit (the smaller the number the older the unit is) the longer is the time series that are available (larger values in the time_period column). This means that the data from older units can be used to forecast the younger units. For example, the data from units r 30-12 to 1 could be used to predict the next 12 periods of the unit 30. This is excatly what the function run_synthetic_forecast() does (To better understand how it is working under the hood it is recommend to check the Synthetic Control Synth Package paper.).

The function call bellow runs a synthetic forecast of 12 time periods of the series x1 of the unit 30.

synthetic_forecast <- run_synthetic_forecast(
  df = df_example,
  col_unit_name = 'unit',
  col_time='time_period',
  periods_to_forecast=12,
  unit_of_interest = '30',
  serie_of_interest = 'x1'
)

The output of the function is a list with 4 tables.

Synthetic Forecat Results

These are the 4 tables that are returned by the function call.

Table 1: synthetic_control_composition

This table summarizes the results related to the unit selection from the Synthetic Control method. The columns are the following:

kable(synthetic_forecast$synthetic_control_composition)

Table 2: variable_importance_and_comparison

This table summarizes the results related to the features/variables selection from the Synthetic Control method. The columns are the following:

kable(head(synthetic_forecast$variable_importance_and_comparison,8))

Table 3: mape_backtest

This table depicts the results of a simple mape back test on the period it was used to forecast. It is worth noting that the intention is not to provide a robust method for validation the model. The Synthetic Control Method is a mathematical approach, not an machine learning, that minimizes the distance without worrying about overfitting the curves. The columns are the following:

kable(synthetic_forecast$mape_backtest)

Table 4: output_projecao

This tables contains the projection itself. The columns are the following:

kable(synthetic_forecast$output_projecao)
proj<- synthetic_forecast$output_projecao
proj %>% glimpse()


Try the SynthCast package in your browser

Any scripts or data that you put into this service are public.

SynthCast documentation built on March 18, 2022, 5:48 p.m.