driftBursts: Inference on drift burst hypothesis
In highfrequency: Tools for Highfrequency Data Analysis

driftBursts

R Documentation

Inference on drift burst hypothesis

Description

Calculates the test-statistic for the drift burst hypothesis

Let the efficient log-price be defined as:

dX_{t} = \mu_{t}dt + \sigma_{t}dW_{t} + dJ_{t},

where \mu_{t}, \sigma_{t}, and J_{t} are the spot drift, the spot volatility, and a jump process respectively. However, due to microstructure noise, the observed log-price is

Y_{t} = X_{t} + \varepsilon_{t}

In order robustify the results to the presence of market microstructure noise, the pre-averaged returns are used:

\Delta_{i}^{n}\overline{Y} = \sum_{j=1}^{k_{n}-1}g_{j}^{n}\Delta_{i+j}^{n}Y,

where g(\cdot) is a weighting function, min(x, 1-x), and k_{n} is the pre-averaging horizon.

The test statistic for the Drift Burst Hypothesis can then be calculated as

\bar{T}_{t}^{n} = \sqrt{\frac{h_{n}}{K_{2}}}\frac{\hat{\bar{\mu}}_{t}^{n}}{\sqrt{\hat{\bar{\sigma}}_{t}^{n}}},

where

\hat{\bar{\mu}}_{t}^{n} = \frac{1}{h_{n}}\sum_{i=1}^{n-k_{n}+2}K\left(\frac{t_{i-1}-t}{h_{n}}\right)\Delta_{i-1}^{n}\overline{Y},

and

\hat{\bar{\sigma}}_{t}^{n} = \frac{1}{h_{n}'}\bigg[\sum_{i=1}^{n-k_{n}+2}\left(K\left(\frac{t_{i-1}-t}{h'_{n}}\right)\Delta_{i-1}^{n}\overline{Y}\right)^{2} \\ + 2\sum_{L=1}^{L_{n}}\omega\left(\frac{L}{L_{n}}\right)\sum_{i=1}^{n-k_{n}-L+2}K\left(\frac{t_{i-1}-t}{h_{n}'}\right)K\left(\frac{t_{i+L-1}-t}{h_{n}'}\right)\Delta_{i-1}^{n}\overline{Y}\Delta_{i-1+L}^{n}\overline{Y}\bigg],

where \omega(\cdot) is a smooth kernel function, in this case the Parzen kernel. L_{n} is the lag length for adjusting for auto-correlation and K(\cdot) is a kernel weighting function, which in this case is the left-sided exponential kernel.

Usage

driftBursts(
  pData,
  testTimes = seq(34260, 57600, 60),
  preAverage = 5,
  ACLag = -1L,
  meanBandwidth = 300L,
  varianceBandwidth = 900L,
  parallelize = FALSE,
  nCores = NA,
  warnings = TRUE
)

Arguments

`pData`	Either a `data.table` or an `xts` object. If pData is a data.table, columns DT and PRICE must be present, containing timestamps of the trades and the price of the trades (in levels) respectively. If pData is an `xts` object and the number of columns is greater than one, PRICE must be present.
`testTimes`	A `numeric` containing the times at which to calculate the tests. The standard of `seq(34260, 57600, 60)` denotes calculating the test-statistic once per minute, i.e. 390 times for a typical 6.5 hour trading day from 9:31:00 to 16:00:00. See details. Additionally, `testTimes` can be set to 'all' where the test statistic will be calculated on each tick more than 5 seconds after opening
`preAverage`	A positive `integer` denoting the length of pre-averaging window for the log-prices. Default is `5`
`ACLag`	A positive `integer` greater than 1 denoting how many lags are to be used for the HAC estimator of the variance - the default of `-1` denotes using an automatic lag selection algorithm for each iteration. Default is `-1L`
`meanBandwidth`	An `integer` denoting the bandwidth for the left-sided exponential kernel for the mean. Default is `300L`
`varianceBandwidth`	An `integer` denoting the bandwidth for the left-sided exponential kernel for the variance. Default is `900L`
`parallelize`	A `logical` to determine whether to parallelize the underlying C++ code (Using OpenMP). Default is `FALSE`. Note that the parallelized code is not interruptable, while the non-parallel code is interruptable and it's checked every 100 iterations.
`nCores`	An `integer` denoting the number of cores to use for calculating the code when parallelized. If this argument is not provided, sequential evaluation will be used even though `parallelize` is TRUE. Default is `NA`
`warnings`	A `logical` denoting whether warnings should be shown. Default is `TRUE`

Details

If the testTimes vector contains instructions to test before the first trade, or more than 15 minutes after the last trade, these entries will be deleted, as not doing so may cause crashes. The test statistic is unstable before max(meanBandwidth , varianceBandwidth) seconds has passed. The lags from the Newey-West algorithm is increased by 2 * (preAveage-1) due to the pre-averaging we know at least this many lags should be corrected for. The maximum of 20 lags is also increased by this factor for the same reason.

Value

An object of class DBH and list containing the series of the drift burst hypothesis test-statistic as well as the estimated spot drift and variance series. The list also contains some information such as the variance and mean bandwidths along with the pre-averaging setting and the amount of observations. Additionally, the list will contain information on whether testing happened for all testTimes entries. Objects of class DBH has the methods print.DBH, plot.DBH, and getCriticalValues.DBH which prints, plots, and retrieves critical values for the test described in appendix B of Christensen, Oomen, and Reno (2020).

Author(s)

Emil Sjoerup

References

Christensen, K., Oomen, R., and Reno, R. (2020) The drift burst hypothesis. Journal of Econometrics. Forthcoming.

Examples



# Usage with data.table object
dat <- sampleTData[as.Date(DT) == "2018-01-02"]
# Testing every 60 seconds after 09:45:00
DBH1 <- driftBursts(dat, testTimes = seq(35100, 57600, 60), preAverage = 2, ACLag = -1L,
                    meanBandwidth = 300L, varianceBandwidth = 900L)
print(DBH1)

plot(DBH1, pData = dat)
# Usage with xts object (1 column)
library("xts")
dat <- xts(sampleTData[as.Date(DT) == "2018-01-03"]$PRICE, 
           order.by = sampleTData[as.Date(DT) == "2018-01-03"]$DT)
# Testing every 60 seconds after 09:45:00
DBH2 <- driftBursts(dat, testTimes = seq(35100, 57600, 60), preAverage = 2, ACLag = -1L,
                    meanBandwidth = 300L, varianceBandwidth = 900L)
plot(DBH2, pData = dat)

## Not run:  
# This block takes some time
dat <- xts(sampleTDataEurope$PRICE, 
           order.by = sampleTDataEurope$DT)
# Testing every 60 seconds after 09:00:00
system.time({DBH4 <- driftBursts(dat, testTimes = seq(32400 + 900, 63000, 60), preAverage = 2, 
             ACLag = -1L, meanBandwidth = 300L, varianceBandwidth = 900L)})

system.time({DBH4 <- driftBursts(dat, testTimes = seq(32400 + 900, 63000, 60), preAverage = 2, 
                                 ACLag = -1L, meanBandwidth = 300L, varianceBandwidth = 900L,
                                 parallelize = TRUE, nCores = 8)})
plot(DBH4, pData = dat)

# The print method for DBH objects takes an argument alpha that determines the confidence level
# of the test performed
print(DBH4, alpha = 0.99)
# Additionally, criticalValue can be passed directly
print(DBH4, criticalValue = 3)
max(abs(DBH4$tStat)) > getCriticalValues(DBH4, 0.99)$quantile

## End(Not run)

highfrequency documentation built on Oct. 4, 2023, 5:08 p.m.