knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The ProbReco
package assumes that base probabilistic forecasts are available. This vignette describes how these can be obtained using the fable
package. Note that the fable
package does currently allow for probabilistic forecast reconciliation, but only under Gaussianity and not using score optimisation.
This vignette considers the case of training reconciliation weights using a rolling window of probabilistic forecasts. A simpler method is to simply use predicted values from a single window of data using the function inscoreopt()
.
The data sim_hierarchy
refer to a simulated 7-variable hierarchy. The bottom level series are all simulated from stationary ARMA models. Noise terms are added so that the residual terms on the bottom levels have higher variance than the middle level residuals, which in turn have higher variance than the top level. For details see [@wp].
library(magrittr) library(dplyr) library(tidyr) library(purrr) library(fable) library(ProbReco) set.seed(1983) sim_hierarchy
To ensure the results are stable we have also set a seed.
To set up, first we should break the data into a series of rolling windows. This can be done using the map
function in the purrr
package.
#Length of window N<-500 #Make data windows data_windows<-purrr::map(1:(nrow(sim_hierarchy)-N+1), function(i){return(sim_hierarchy[i:(i+N-1),])})
This creates a list, the first element of which is the data from $t=1$ to $t=500$, the second element is the data from $t=2$ to $t=501$, etc...
data_windows[[1]]%>%head(3) data_windows[[1]]%>%tail(3) data_windows[[2]]%>%head(3) data_windows[[2]]%>%tail(3) data_windows[[500]]%>%head(3) data_windows[[500]]%>%tail(3)
A function for modelling and then obtaining the forecast mean and variance using data from a single window is written below. This can be used with the map family of functions from the purrr
package. This function is given as
forecast_window <- function(data_w){ data_w%>% tidyr::pivot_longer(cols = -Time, names_to = 'var', values_to = 'value')%>% as_tsibble(index = 'Time',key='var')%>% model(arma11=ARIMA(value~1+pdq(1,0,1)+PDQ(0,0,0)))%>% forecast(h=1)%>% dplyr::arrange(match(var,c("Tot","A","B","AA","AB","BA","BB")))->f mean<-map_dbl(f$.distribution,use_series,mean) sd<-map_dbl(f$.distribution,use_series,sd) return(list(mean=mean,sd=sd)) }
Using the fable
package requires some manipulation of the data. So first, the data frame is converted to long format using pivot_longer
. The data must then be converted to a tsibble
. Finally, the model
function can be used from fable. Here an ARMA(1,1) is fit to each data set. Note that this is only for illustrative purposes, and there will be some misspecification. In practice the order of the ARMA can be chosen automatically. The forecast
function is used to obtain the forecast mean and variance for each variable. Only 1 step ahead forecasts are obtained for this example. The variables are arranged in the correct order and then the forecast mean and variance are extracted. Here, dependence is completely ignored, i.e. the base forecasts are indepenedent.
To fit the model and obtain the base forecast mean and variance for each window, the map
function can be used. Here we will only fit models to the first 10 windows meaning that t=501 to t=510 will constitute the training data for learning the reconciliation weights.
#Number of windows for training W<-10 all_fc<-purrr::map(data_windows[1:W],forecast_window)
The hierarchy has the following $\boldsymbol{S}$ matrix
S<-matrix(c(1,1,1,1, 1,1,0,0, 0,0,1,1, 1,0,0,0, 0,1,0,0, 0,0,1,0, 0,0,0,1),7,4,byrow = TRUE)
The following code obtains the realisations in the form required.
obs_i<-function(i){ sim_hierarchy%>% dplyr::filter(Time==i)%>% tidyr::pivot_longer(-Time,names_to = 'var')%>% dplyr::arrange(match(var,c("Tot","A","B","AA","AB","BA","BB")))%>% dplyr::pull(value) } all_y<-purrr::map((N+1):(N+W),obs_i)
This list of length r W
has the vector of true realisations from t=501 as the first element, the vector of true realisations from t=502 as the second element, etc. Note that the arrange
and match
functions are used to preserve the ordering of the variables from top to bottom. Although any ordering is acceptable, the order must agree with the ordering of the rows in the $\boldsymbol{S}$ matrix.
The next step is to create a list of functions where the first element generates from the probabilistic forecast distribution at time $t=501$, the second element generates from the probabilistic forecast distribution at time $t=502$, etc. To do this write a function that returns a function as follows
make_genfunc<-function(input){ f<-function(){ fc_mean<-input$mean fc_sd<-input$sd out<-matrix(rnorm(350,mean=fc_mean,sd=fc_sd),7,50) return(out) } return(f) }
The 'inner' function f
generates fifty, vector-valued, one-step ahead forecasts from a independent Gaussian distributions with mean and standard deviation extracted from input
. The 'outer' function make_genfunc
is required so that the entire list can be created using map
all_prob<-purrr::map(all_fc,make_genfunc)
The object all_prob
is in the form required.
The total score for bottom up can be found using
G_bu<-as.vector(rbind(matrix(0,4,4),diag(rep(1,4)))) es_bu<-total_score(all_y,all_prob,S,G_bu)
The optimal score can now be found using
opt<-scoreopt(all_y,all_prob,S)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.