knitr::opts_chunk$set( collapse = TRUE, eval = FALSE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of lobsteR is to provide a tidy framework to request data from lobsterdata.com, to download, unzip, and clean the data. The package focuses on the core functionalities required to get LOBSTER data ready fast, for subsequent typical high-frequency econometrics applications, we refer to the highfrequency
package.
You can install the development version of lobsteR from GitHub with:
# install.packages("devtools") devtools::install_github("voigtstefan/lobsteR")
library(lobsteR) library(tidyverse)
With lobsteR
you can connect easily connect with lobsterdata.com using your own credentials.
lobster_login <- account_login( login = Sys.getenv("user"), # Replace with your own account mail adress pwd = Sys.getenv("pwd") # Replace with your own account password )
Next, we request some data from lobsterdata.com, e.g., message-level data from META for the period from May 1st, 2023 until May 3rd, 2023. ´level´ corresponds to the requested number of orderbook snapshot levels.
data_request <- request_query( symbol = "MSFT", start_date = "2023-05-01", end_date = "2023-05-03", level = 10) data_request
request_submit(account_login = lobster_login, request = data_request)
After submitting the request, lobsterdata.com will work on providing the order book snapshots. Depending on the number of messages to process, this may take some time. Once done, the requested data is available in your account archive - ready to download!
lobster_archive <- account_archive(account_login = lobster_login)
When downloading, we automatically unzip the data (this can be omitted using unzip = FALSE
)
data_download( requested_data = lobster_archive |> filter(symbol == "MSFT"), account_login = lobster_login, path = "../tmp_data")
After downloading the data, use cases for Lobster data may differ. A convenient way to work with order book snapshots is to combine message and order book data first which allows for follow-up cleaning procedures.
The helper function process_data
processes lobster files, returns a clean, zipped .csv
file and removes redundant raw files.
process_data(path = "../tmp_data")
Finally, we clean the order book files. Such procedures may be use-case specific but typically, the following steps can be relevant:
Each of the points above is implemented in the function clean_data
.
orderbook <- clean_data(path = "../tmp_data/MSFT_2023-05-02_10.csv.gz")
Finally, we can analyse the orderbook files. The figure below shows the dynamics of the traded prices (red points) and the quoted prices at the higher order book levels.
orderbook_trades <- orderbook |> filter(type==4|type==5) |> select(ts, m_price) orderbook_quotes <- orderbook |> mutate(ts_new = floor_date(ts, "10 seconds")) |> group_by(ts_new) |> summarise_all(dplyr::last) |> select(-ts_new) |> mutate(id = row_number()) |> select(ts, id, matches("bid|ask")) |> gather(level, price, -ts, -id) |> separate(level, into=c("side", "variable", "level"), sep="_") |> mutate(level = as.numeric(level)) |> pivot_wider(names_from = variable, values_from = price) ggplot() + theme_bw() + geom_point(data = orderbook_quotes, aes(x=ts, y=price, color=as_factor(level), size = size/max(size)), alpha = 0.1) + geom_step(data = orderbook_trades, aes(x=ts, y=m_price), linewidth = 0.1) + labs(title="Orderbook Dynamics", y= "Price", x="") + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), legend.position ="none") + scale_y_continuous()
Next, I compute the midquote, bid-ask spread, aggregate volume, and depth (the amount of trade-able units in the order book).
orderbook_summaries <- orderbook |> transmute(ts, midquote = ask_price_1 / 2 + bid_price_1 / 2, trading_volume = if_else(type == 4 | type == 5, m_price * m_size, as.double(NA)), depth_bid = bid_size_1, depth_ask = ask_size_1, spread = 10000 * (ask_price_1 - bid_price_1) / midquote ) |> mutate( ts_latency = as.numeric(lead(ts)) - as.numeric(ts), # time between messages ts_latency = if_else(is.na(ts_latency), 0, ts_latency), # 0 for first message ts_minute = floor_date(ts, "30 minutes") ) |> group_by(ts_minute) |> summarise( midquote = last(midquote), n_trades = sum(!is.na(trading_volume)), n = n(), # number of messages trading_volume = sum(trading_volume, na.rm = TRUE), trading_volume = if_else(is.na(trading_volume), 0, trading_volume), depth0_bid = weighted.mean(depth_bid, ts_latency), depth0_ask = weighted.mean(depth_ask, ts_latency), spread = weighted.mean(spread, ts_latency) ) orderbook_summaries |> knitr::kable()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.