README.md

Build
Status

Overview

The motivation for the package HighFreq is to create a library of functions designed for managing trade and quote (TAQ) and OHLC data, and for efficiently estimating various statistics, like volatility, skew, Hurst exponent, and Sharpe ratio, from that data.

The are several other packages which offer much of this functionality, like for example:

Unfortunately many of the functions in these packages are either too slow, or lack some critical functionality, or produce data in inconsistent formats (with NA values, etc.) The package HighFreq aims to create a unified framework, with consistent data formats and naming conventions.

The package HighFreq relies on OHLC price and volume data formatted as xts time series, because the OHLC data format provides an efficient way of compressing TAQ data, while preserving information about price levels, volatility (range), and trading volumes. Most existing packages don’t rely on OHLC data, so their statistical estimators are much less efficient than those in package HighFreq.

Running and Rolling Statistics Over Time Series Data

Definitions of running and rolling statistics (aggregations):

Functions for data scrubbing, formatting, and aggregation

The package HighFreq contains several categories of functions designed for:

The package HighFreq contains functions for:

Installation and loading

Install package HighFreq from github:

install.packages("devtools")
devtools::install_github(repo="algoquant/HighFreq")
library(HighFreq)

Install package HighFreq from source on local drive:

install.packages(pkgs="C:/Develop/R/HighFreq", repos=NULL, type="source")
# Install package from source on local drive using R CMD
R CMD INSTALL C:\Develop\R\HighFreq
library(HighFreq)

Build reference manual for package HighFreq from .Rd files:

system("R CMD Rd2pdf C:/Develop/R/HighFreq")
R CMD Rd2pdf C:\Develop\R\HighFreq

Data

Trade and Quote (TAQ) data contains intraday trades and quotes on exchange-traded stocks and futures. TAQ data is spaced irregularly in time, with data recorded each time a new trade or quote arrives. The rows of TAQ data contain the quoted and traded prices, and the corresponding quote size or trade volume.

TAQ data can be aggregated into Open-High-Low-Close (OHLC) data. OHLC data is evenly spaced in time, with each row containing the Open, High, Low, and Close prices, and the trade Volume, recorded over the past time interval (called a bar of data). The Open and Close prices are the first and last trade prices recorded in the time bar. The High and Low prices are the highest and lowest trade prices recorded in the time bar. The Volume is the total trading volume recorded in the time bar.

Aggregating TAQ data into OHLC data provides data compression, while preserving information about price levels, volatility (range), and trading volumes. In addition, evenly spaced data allows analysis of multiple time series, since all the prices are given at the same moments of time.

The package HighFreq includes three xts time series called SPY, TLT, and VXX, containing intraday 1-minute OHLC data for the SPY, TLT, and VXX ETFs. The package HighFreq also includes an xts time series called SPY_TAQ with a single day of TAQ data for the SPY ETF. The data is set up for lazy loading, so it doesn’t require calling data(hf_data) to load it before being able to call it.

The data source is the Wharton Research Data Service

List all the data sets included in the HighFreq package:

# list all datasets in package HighFreq
data(package="HighFreq")

Examples

More examples can be found in the vignettes titled managing_time_series and estimating_statistics.

Aggregate TAQ data into a 1-minute bar OHLC time series:

# aggregate TAQ data to 1-min OHLC bar data, for a single symbol, and save to file
sym_bol <- "SPY"
save_scrub_agg(sym_bol, 
               data_dir="E:/mktdata/sec/", 
               output_dir="E:/output/data/")

Calculate daily trading volume:

daily_volume <- apply.daily(x=Vo(SPY), FUN=sum)
colnames(daily_volume) <- "SPY.Volume")
chart_Series(x=daily_volume, name="daily trading volumes for SPY")

Calculate daily average open to close variance from minutely OHLC prices:

# calculate daily average open to close variance
var_daily <- (6.5*60*60^2)*xts::apply.daily(x=SPY, FUN=agg_stats_r, 
                              calc_bars="run_variance", calc_method="rogers_satchell")
colnames(var_daily) <- "SPY.Var"
chart_Series(100*sqrt(var_daily["/2010"]), name="SPY daily standard deviation")

Calculate daily skew from minutely OHLC prices:

skew_daily <- apply.daily(x=SPY, FUN=agg_stats_r, calc_bars="run_skew")
skew_daily <- skew_daily/(var_daily)^(1.5)
colnames(skew_daily) <- "SPY.Skew")
chart_Series(x=skew_daily, name="daily skew for SPY")

Calculate rolling prices:

roll_prices <- rutils::roll_sum(Op(SPY), win_dow=10)/10
colnames(roll_prices) <- "SPY.Rets"
# plot candle chart
chart_Series(SPY["2013-11-12", ], name="SPY Prices")
add_TA(roll_prices["2013-11-12"], on=1, col="red", lwd=2)

Calculate rolling volume-weighted variance:

var_rolling <- roll_stats(oh_lc=SPY["2012"], calc_stats="run_variance", win_dow = 10)
# plot without overnight jump
chart_Series(var_rolling["2012-11-12", ][-(1:11)], name="SPY rolling volume-weighted variance")

Calculate daily seasonality of variance:

var_seasonal <- season_ality((24*60*60^2)*run_variance(oh_lc=SPY))
colnames(var_seasonal) <- "SPY.var_seasonal"
chart_Series(x=var_seasonal, name="SPY variance daily seasonality")


algoquant/HighFreq documentation built on July 13, 2024, 8:26 p.m.