The goal of lobsteR is to provide a tidy framework to request data from
lobsterdata.com, to download, unzip, and clean the data. The package
focuses on the core functionalities required to get LOBSTER data ready
fast, for subsequent typical high-frequency econometrics applications,
we refer to the highfrequency
package.
You can install the development version of lobsteR from GitHub with:
# install.packages("devtools")
devtools::install_github("voigtstefan/lobsteR")
library(lobsteR)
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.2 ✔ readr 2.1.4
#> ✔ forcats 1.0.0 ✔ stringr 1.5.0
#> ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
#> ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
#> ✔ purrr 1.0.1
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
With lobsteR
you can connect easily connect with lobsterdata.com using
your own credentials.
lobster_login <- account_login(
login = Sys.getenv("user"), # Replace with your own account mail adress
pwd = Sys.getenv("pwd") # Replace with your own account password
)
Next, we request some data from lobsterdata.com, e.g., message-level data from META for the period from May 1st, 2023 until May 3rd, 2023. ´level´ corresponds to the requested number of orderbook snapshot levels.
data_request <- request_query(
symbol = "MSFT",
start_date = "2023-05-01",
end_date = "2023-05-03",
level = 10)
data_request
#> symbol start_date end_date level
#> 1 MSFT 2023-05-01 2023-05-01 10
#> 2 MSFT 2023-05-02 2023-05-02 10
#> 3 MSFT 2023-05-03 2023-05-03 10
request_submit(account_login = lobster_login,
request = data_request)
After submitting the request, lobsterdata.com will work on providing the order book snapshots. Depending on the number of messages to process, this may take some time. Once done, the requested data is available in your account archive - ready to download!
lobster_archive <- account_archive(account_login = lobster_login)
When downloading, we automatically unzip the data (this can be omitted
using unzip = FALSE
)
data_download(
requested_data = lobster_archive |> filter(symbol == "MSFT"),
account_login = lobster_login,
path = "../tmp_data")
After downloading the data, use cases for Lobster data may differ. A
convenient way to work with order book snapshots is to combine message
and order book data first which allows for follow-up cleaning
procedures. The helper function process_data
processes lobster files,
returns a clean, zipped .csv
file and removes redundant raw files.
process_data(path = "../tmp_data")
Finally, we clean the order book files. Such procedures may be use-case specific but typically, the following steps can be relevant:
Each of the points above is implemented in the function clean_data
.
orderbook <- clean_data(path = "../tmp_data/MSFT_2023-05-02_10.csv.gz")
Finally, we can analyse the orderbook files. The figure below shows the dynamics of the traded prices (red points) and the quoted prices at the higher order book levels.
orderbook_trades <- orderbook |>
filter(type==4|type==5) |>
select(ts, m_price)
orderbook_quotes <- orderbook |>
mutate(ts_new = floor_date(ts, "10 seconds")) |>
group_by(ts_new) |>
summarise_all(dplyr::last) |>
select(-ts_new) |>
mutate(id = row_number()) |>
select(ts, id, matches("bid|ask")) |>
gather(level, price, -ts, -id) |>
separate(level, into=c("side", "variable", "level"), sep="_") |>
mutate(level = as.numeric(level)) |>
pivot_wider(names_from = variable, values_from = price)
ggplot() +
theme_bw() +
geom_point(data = orderbook_quotes,
aes(x=ts, y=price, color=as_factor(level), size = size/max(size)), alpha = 0.1) +
geom_step(data = orderbook_trades, aes(x=ts, y=m_price), linewidth = 0.1) +
labs(title="Orderbook Dynamics",
y= "Price",
x="") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position ="none") +
scale_y_continuous()
Next, I compute the midquote, bid-ask spread, aggregate volume, and depth (the amount of trade-able units in the order book).
orderbook_summaries <- orderbook |>
transmute(ts,
midquote = ask_price_1 / 2 + bid_price_1 / 2,
trading_volume = if_else(type == 4 | type == 5, m_price * m_size, as.double(NA)),
depth_bid = bid_size_1,
depth_ask = ask_size_1,
spread = 10000 * (ask_price_1 - bid_price_1) / midquote
) |>
mutate(
ts_latency = as.numeric(lead(ts)) - as.numeric(ts), # time between messages
ts_latency = if_else(is.na(ts_latency), 0, ts_latency), # 0 for first message
ts_minute = floor_date(ts, "30 minutes")
) |>
group_by(ts_minute) |>
summarise(
midquote = last(midquote),
n_trades = sum(!is.na(trading_volume)),
n = n(), # number of messages
trading_volume = sum(trading_volume, na.rm = TRUE),
trading_volume = if_else(is.na(trading_volume), 0, trading_volume),
depth0_bid = weighted.mean(depth_bid, ts_latency),
depth0_ask = weighted.mean(depth_ask, ts_latency),
spread = weighted.mean(spread, ts_latency)
)
orderbook_summaries |> knitr::kable()
| ts_minute | midquote | n_trades | n | trading_volume | depth0_bid | depth0_ask | spread | |:--------------------|---------:|---------:|-------:|---------------:|-----------:|-----------:|----------:| | 2023-05-02 09:30:00 | 308.095 | 1478 | 198354 | 229203080 | 143.02113 | 131.21635 | 1.3831351 | | 2023-05-02 10:00:00 | 306.400 | 1550 | 297456 | 204995226 | 59.42056 | 94.74389 | 0.6900655 | | 2023-05-02 10:30:00 | 305.225 | 1535 | 275886 | 143672592 | 66.57444 | 96.22889 | 0.6205947 | | 2023-05-02 11:00:00 | 305.095 | 1445 | 222459 | 121771369 | 67.51500 | 74.59222 | 0.6228345 | | 2023-05-02 11:30:00 | 304.165 | 1325 | 173694 | 90401857 | 63.86000 | 61.58611 | 0.6704186 | | 2023-05-02 12:00:00 | 304.760 | 1066 | 130202 | 58181014 | 82.62556 | 69.69167 | 0.6254496 | | 2023-05-02 12:30:00 | 305.095 | 1035 | 110693 | 59400355 | 109.66333 | 71.98778 | 0.5674270 | | 2023-05-02 13:00:00 | 306.105 | 1094 | 107860 | 62342934 | 96.43722 | 85.55944 | 0.5668499 | | 2023-05-02 13:30:00 | 306.190 | 1032 | 122258 | 73001960 | 123.25889 | 113.37333 | 0.5586895 | | 2023-05-02 14:00:00 | 305.755 | 1053 | 154777 | 56535238 | 103.20611 | 105.18444 | 0.5734766 | | 2023-05-02 14:30:00 | 305.810 | 1070 | 148002 | 57893729 | 126.25778 | 100.87444 | 0.5719214 | | 2023-05-02 15:00:00 | 305.600 | 1142 | 145454 | 64077020 | 117.65333 | 112.72778 | 0.5948185 | | 2023-05-02 15:30:00 | 305.430 | 1383 | 164758 | 183795397 | 141.63146 | 162.70206 | 0.5868798 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.