intradayModel: Modeling and Forecasting Financial Intraday Signals"
In intradayModel: Modeling and Forecasting Financial Intraday Signals

library(knitr)
opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.align = "center",
  fig.retina = 2,
  out.width = "100%",
  dpi = 96 ,
  pngquant = "--speed=1"
)
knit_hooks$set(pngquant = hook_pngquant)  # brew install pngquant

```{css, echo = FALSE} / ensure cleanrmd is centered / body { margin: 0 auto; max-width: 1000px; padding: 2rem; }

/ math is smaller / .math { font-size: small; }

/ set reference spacing in cleanrmd / .references>div:first-child{ margin-bottom: 1.6em; }

------------------------------------------------------------------------

> Welcome to the `intradayModel` package! This vignette provides an overview of the package's features and how to use them. `intradayModel` uses state-space models to model and forecast financial intraday signal, with a focus on intraday trading volume. Our team is currently working on expanding the package to include more support for intraday volatility.

# Quick start

To get started, we load our package and sample data: the 15-minute intraday trading volume of AAPL from 2019-01-02 to 2019-06-28, covering 124 trading days. We use the first 104 trading days for fitting, and the last 20 days for evaluation of forecasting performance.

```r
library(intradayModel)
data(volume_aapl)
volume_aapl[1:5, 1:5] # print the head of data

volume_aapl_training <- volume_aapl[, 1:104]
volume_aapl_testing <- volume_aapl[, 105:124]

Next, we fit a univariate state-space model using fit_volume( ) function.

model_fit <- fit_volume(volume_aapl_training)

Once the model is fitted, we can analyze the hidden components of any intraday volume based on all its observations. By calling decompose_volume( ) function with purpose = "analysis", we obtain the smoothed daily, seasonal, and intraday dynamic components. It involves incorporating both past and future observations to refine the state estimates.

analysis_result <- decompose_volume(purpose = "analysis", model_fit, volume_aapl_training)

# visualization
plots <- generate_plots(analysis_result)
plots$log_components

To see how well our model performs on new data, we call forecast_volume( ) function to do one-bin-ahead forecast on the testing set.

forecast_result <- forecast_volume(model_fit, volume_aapl_testing)

# visualization
plots <- generate_plots(forecast_result)
plots$original_and_forecast

Now that you have a quick start on using the package, let's explore the details and dive deeper into its functionalities and features.

Usage of the package

Preliminary theory

Intraday observations of trading volume are divided into days, indexed by $t\in{1,\dots,T}$. Each day is further divided into bins, indexed by $i\in{1,\dots,I}$. To refer to a specific observation, we use the index $\tau = I \times (t-1) + i$.

Our package uses a state-space model to extract several components of intraday volume. These components include the daily component, which adjusts the mean level of the time series; the seasonal component, which captures the U-shaped intraday periodic pattern; and the intraday dynamic component, which represents movements within a day.

The observed intraday volume can be written in a multiplicative combination of the components [@brownlees2011intra]:

$$ \large \text{intraday volume} = \text{daily} \times \text{seasonal} \times \text{intraday dynamic} \times \text{noise}. \tag{1} \small $$

Alternatively, by taking the logarithm transform, the intraday volume can be also regarded as an addictive combination of these components:

$$ \large y_{\tau} = \eta_{\tau} + \phi_i + \mu_{t,i} + v_{t,i}. \tag{2} \small $$

The state-space model proposed by [@chen2016forecasting] is defined on Equation (2) as $$ \large \begin{aligned} \mathbf{x}{\tau+1} &= \mathbf{A}{\tau}\mathbf{x}{\tau} + \mathbf{w}{\tau},\ y_{\tau} &= \mathbf{C}\mathbf{x}{\tau} + \phi{\tau} + v_\tau, \end{aligned} \tag{3} \small $$ where

$\mathbf{x}{\tau} = [\eta{\tau}, \mu_{\tau}]^\top$ is the hidden state vector containing the log daily component and the log intraday dynamic component;
$\mathbf{A}{\tau} = \left[\begin{array}{l}a{\tau}^{\eta}&0\0&a^{\mu}\end{array} \right]$ is the state transition matrix with $a_{\tau}^{\eta} = \begin{cases}a^{\eta}&\tau = kI, k = 1,2,\dots\0&\text{otherwise};\end{cases}$
$\mathbf{C} = [1, 1]$ is the observation matrix;
$\phi_{\tau}$ is the corresponding element from $\boldsymbol{\phi} = [\phi_1,\dots, \phi_I]^\top$, which is the log seasonal component;
$\mathbf{w}{\tau} = \left[\epsilon{\tau}^{\eta},\epsilon_{\tau}^{\mu}\right]^\top \sim \mathcal{N}(\mathbf{0}, \mathbf{Q}{\tau})$ represents the i.i.d. Gaussian noise in the state transition, with a time-varying covariance matrix $\mathbf{Q}{\tau} = \left[\begin{array}{l}(\sigma_\tau^{\eta})^2&0\0&(\sigma^{\mu})^2\end{array} \right]$ and $\sigma_\tau^{\eta} = \begin{cases}\sigma^{\eta}&\tau = kI, k = 1,2,\dots\0&\text{otherwise};\end{cases}$
$v_\tau \sim \mathcal{N}(0, r)$ is the i.i.d. Gaussian noise in the observation;
$\mathbf{x}_1$ is the initial state at $\tau = 1$, and it follows $\mathcal{N}(\mathbf{x}_0, \mathbf{V}_0)$.

In this model, $\boldsymbol{\Theta} = {a^{\eta}, a^{\mu}, (\sigma^{\eta})^2, (\sigma^{\mu})^2, r, \boldsymbol{\phi}, \mathbf{x}_0, \mathbf{V}_0 }$ are treated as parameters.

Datasets

Two data classes of intraday volume are supported:

a 2D numeric matrix of size (n_bin, n_day);
an xts object.

To help you get started, we provide two sample datasets: a matrix-class volume_aapl and an xts-class volume_fdx. Here, we elaborate on the later one.

data(volume_fdx)
head(volume_fdx)
tail(volume_fdx)

Fitting

fit_volume(data, fixed_pars = NULL, init_pars = NULL, verbose = 0, control = NULL)

To fit a univariate state-space model on intraday volume, you should use fit_volume( ) function. If you want to fix some parameters to specific values, you can provide a list of values to fixed_pars. If you have prior knowledge of the initial values for the unfitted parameters, you can provide it through init_pars. Besides, verbose controls the level of print, and more control options can be set via control.

The fitting process stops when either the maximum number of iterations is reached or the termination criteria is met $\|\Delta \boldsymbol{\Theta}_i\| \le \text{abstol}$.

The following code shows how to fit the model to the FDX stock.

# set fixed value
fixed_pars <- list()
fixed_pars$"x0" <- c(13.33, -0.37)

# set initial value 
init_pars <- list()
init_pars$"a_eta" <- 1

volume_fdx_training <- volume_fdx['2019-07-01/2019-11-30']
model_fit <- fit_volume(volume_fdx_training, verbose = 2, control = list(acceleration = TRUE))

Trading days with missing bins are automatically removed. They are 2019-07-03 (Independence Day) and 2019-11-29 (Thanksgiving Day) which have early close.

Decomposition

decompose_volume(purpose, model, data, burn_in_days = 0)

decompose_volume( ) function allows you to decomposes the intraday volume into its daily, seasonal, and intraday dynamic components.

With purpose = "analysis", it applies Kalman smoothing to estimate the hidden states given all available observations up to a certain point in time. The daily component and intraday dynamic component at time $\tau$ are the smoothed state estimate conditioned on all the data, and denoted by $\mathbb{E}[\mathbf{x}{\tau}|{y{j}}_{j=1}^{M}]$, where $M$ is the total number of bins in the dataset. Besides, the seasonal component has the value of $\boldsymbol{\phi}$.

analysis_result <- decompose_volume(purpose = "analysis", model_fit, volume_fdx_training)

str(analysis_result)

Function generate_plots( ) visualizes the smooth components and the smoothing performance.

plots <- generate_plots(analysis_result)
plots$log_components
plots$original_and_smooth

With purpose = "forecast", it applies Kalman forecasting to estimate the one-bin-ahead hidden state based on the available observations, which is mathematically denoted by $\mathbb{E}[\mathbf{x}{\tau+1}|{y{j}}_{j=1}^{\tau}]$. Details can be found in the next subsection.

This function also helps to evaluate the model performance with the following measures:

Mean absolute error (MAE): $\frac{1}{M}\sum_{\tau=1}^M\lvert\hat{y}\tau - y\tau\rvert$.
Mean absolute percent error (MAPE): $\frac{1}{M}\sum_{\tau=1}^M\frac{\lvert\hat{y}\tau - y\tau\rvert}{y_\tau}$.
Root mean square error (RMSE): $\sqrt{\sum_{\tau=1}^M\frac{\left(\hat{y}\tau - y\tau\right)^2}{M}}$.

Forecasting

forecast_volume(model, data, burn_in_days = 0)

forecast_volume( ) function is a wrapper of decompose_volume(purpose = "forecast", ...). It forecasts the one-bin-ahead intraday volume on a new dataset. The one-bin-ahead forecast is mathematically denoted by $\hat{y}{\tau+1} = \mathbb{E}[y{\tau+1}|{y_{j}}_{j=1}^{\tau}]$.

When encountering a new dataset with different statistical characteristics or from different stocks, the state space model may not initially start in an optimal state. To address this, the first burn_in_days days in the data can be utilized to warm up the Kalman filter, allowing it to reach the desired state. These initial days will be discarded after initialization.

# use training data for burn-in
forecast_result <- forecast_volume(model_fit, volume_fdx, burn_in_days = 105) 

str(forecast_result)

Function generate_plots( ) visualizes the one-bin-ahead forecast components and the forecasting performance.

plots <- generate_plots(forecast_result)
plots$log_components
plots$original_and_forecast

Next steps

This guide gives an overview of the package's main features. Check the manual for details on each function, including parameters and examples.

The current version only supports univariate state-space models for intraday trading volume. Soon, we'll add models for intraday volatility and their multivariate versions. We hope you find these resources helpful and that our package will continue to be a valuable tool for your work.