knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This is a walk through of how the package is intended to be used with a practical example.
The first thing that a forecast needs a data to be forecasted. The SynthCast provides a example of how it expected a dataset to look like, the code bellow loads the package and the example dataset:
library(knitr) library(SynthCast) data('df_example') kable(head(df_example))
The dataset is expected to have 3 types of columns:
The table bellow shows the max time for each unit:
library(dplyr) df_example %>% group_by(unit) %>% summarise(max_time_period=max(time_period)) %>% filter(unit %in% c(1, 2, 3, 4, 5, 45, 46, 47, 48, 49, 50)) %>% kable()
As one can see the older unit (the smaller the number the older the unit is) the longer is the time series that are available (larger values in the time_period column). This means that the data from older units can be used to forecast the younger units. For example, the data from units r 30-12 to 1 could be used to predict the next 12 periods of the unit 30. This is excatly what the function run_synthetic_forecast() does (To better understand how it is working under the hood it is recommend to check the Synthetic Control Synth Package paper.).
The function call bellow runs a synthetic forecast of 12 time periods of the series x1 of the unit 30.
synthetic_forecast <- run_synthetic_forecast( df = df_example, col_unit_name = 'unit', col_time='time_period', periods_to_forecast=12, unit_of_interest = '30', serie_of_interest = 'x1' )
The output of the function is a list with 4 tables.
These are the 4 tables that are returned by the function call.
synthetic_control_compositionThis table summarizes the results related to the unit selection from the Synthetic Control method. The columns are the following:
kable(synthetic_forecast$synthetic_control_composition)
execution_date: The date that the forecast was executed in the YYYY-MM-DD format;projected_unit: The forcasted unit; projected_serie: The forecasted serie;synthetic_units/w.weights: the units (from r 30-12 to 1) selected and their recpective weights.variable_importance_and_comparisonThis table summarizes the results related to the features/variables selection from the Synthetic Control method. The columns are the following:
kable(head(synthetic_forecast$variable_importance_and_comparison,8))
execution_date: The date that the forecast was executed in the YYYY-MM-DD format;projected_unit: The forcasted unit; projected_serie: The forecasted serie;variable: The variable selected;unit_of_interest: The mean value over time of the variable in column variable from the unit in the projected_unit;synthetic: The mean value over time of the variable in column variable of the syntehtic unit;sample: The mean value over time of the variable in column variable of the whole dataset;v.weights: The weight of the variable in the column variable.mape_backtestThis table depicts the results of a simple mape back test on the period it was used to forecast. It is worth noting that the intention is not to provide a robust method for validation the model. The Synthetic Control Method is a mathematical approach, not an machine learning, that minimizes the distance without worrying about overfitting the curves. The columns are the following:
kable(synthetic_forecast$mape_backtest)
execution_date: The date that the forecast was executed in the YYYY-MM-DD format;projected_unit: The forcasted unit; projected_serie: The forecasted serie;max_time_unit_of_interest: The age of the unit of interest;periods_to_forecast: Periods that were forecasted;elegible_control_units: Number of elegible units to be used to forecast;mape: The mean absolute percentage error in the from 1 to max_time_unit_of_interest.output_projecaoThis tables contains the projection itself. The columns are the following:
kable(synthetic_forecast$output_projecao)
execution_date: The date that the forecast was executed in the YYYY-MM-DD format;projected_unit: The forcasted unit; time_period: The time period;projected_serie: The forecasted serie;projected_serie_value: The value of the seria/variable that was projected, from colun projected_serie;is_projected: 1 indicates that the value is projected, 0 indicates that the value is observed.proj<- synthetic_forecast$output_projecao proj %>% glimpse()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.