How to Run a Synthetic Forecast

  collapse = TRUE,
  comment = "#>"

This is a walk through of how the package is intended to be used with a practical example.

The Dataset

The first thing that a forecast needs a data to be forecasted. The SynthCast provides a example of how it expected a dataset to look like, the code bellow loads the package and the example dataset:


The dataset is expected to have 3 types of columns:

The table bellow shows the max time for each unit:


df_example %>%
  group_by(unit) %>%
  summarise(max_time_period=max(time_period)) %>%
  filter(unit %in% c(1, 2, 3, 4, 5, 45, 46, 47, 48, 49, 50)) %>% 

As one can see the older unit (the smaller the number the older the unit is) the longer is the time series that are available (larger values in the time_period column). This means that the data from older units can be used to forecast the younger units. For example, the data from units r 30-12 to 1 could be used to predict the next 12 periods of the unit 30. This is excatly what the function run_synthetic_forecast() does (To better understand how it is working under the hood it is recommend to check the Synthetic Control Synth Package paper.).

The function call bellow runs a synthetic forecast of 12 time periods of the series x1 of the unit 30.

synthetic_forecast <- run_synthetic_forecast(
  df = df_example,
  col_unit_name = 'unit',
  unit_of_interest = '30',
  serie_of_interest = 'x1'

The output of the function is a list with 4 tables.

Synthetic Forecat Results

These are the 4 tables that are returned by the function call.

Table 1: synthetic_control_composition

This table summarizes the results related to the unit selection from the Synthetic Control method. The columns are the following:


Table 2: variable_importance_and_comparison

This table summarizes the results related to the features/variables selection from the Synthetic Control method. The columns are the following:


Table 3: mape_backtest

This table depicts the results of a simple mape back test on the period it was used to forecast. It is worth noting that the intention is not to provide a robust method for validation the model. The Synthetic Control Method is a mathematical approach, not an machine learning, that minimizes the distance without worrying about overfitting the curves. The columns are the following:


Table 4: output_projecao

This tables contains the projection itself. The columns are the following:

proj<- synthetic_forecast$output_projecao
proj %>% glimpse()

Try the SynthCast package in your browser

Any scripts or data that you put into this service are public.

SynthCast documentation built on March 18, 2022, 5:48 p.m.