The goal of lobsteR is to provide a tidy framework to request data from, to download, unzip, and clean the data. The package focuses on the core functionalities required to get LOBSTER data ready fast, for subsequent typical high-frequency econometrics applications, we refer to the highfrequency package.


You can install the development version of lobsteR from GitHub with:

# install.packages("devtools")

Example: Request and download data from


With lobsteR you can connect easily connect with using your own credentials.

lobster_login <- account_login(
  login = Sys.getenv("user"), # Replace with your own account mail adress
  pwd = Sys.getenv("pwd") # Replace with your own account password

Next, we request some data from, e.g., message-level data from META for the period from May 1st, 2023 until May 3rd, 2023. ´level´ corresponds to the requested number of orderbook snapshot levels.

data_request <- request_query(
  symbol = "MSFT",
  start_date = "2023-05-01",
  end_date = "2023-05-03",
  level = 10)

request_submit(account_login = lobster_login,
               request = data_request)

After submitting the request, will work on providing the order book snapshots. Depending on the number of messages to process, this may take some time. Once done, the requested data is available in your account archive - ready to download!

lobster_archive <- account_archive(account_login = lobster_login)

When downloading, we automatically unzip the data (this can be omitted using unzip = FALSE)

  requested_data = lobster_archive |> filter(symbol == "MSFT"),
  account_login = lobster_login,
  path = "../tmp_data")

Processing of Lobster files.

After downloading the data, use cases for Lobster data may differ. A convenient way to work with order book snapshots is to combine message and order book data first which allows for follow-up cleaning procedures. The helper function process_data processes lobster files, returns a clean, zipped .csv file and removes redundant raw files.

process_data(path = "../tmp_data")

Finally, we clean the order book files. Such procedures may be use-case specific but typically, the following steps can be relevant:

Each of the points above is implemented in the function clean_data.

orderbook <- clean_data(path = "../tmp_data/MSFT_2023-05-02_10.csv.gz")

Finally, we can analyse the orderbook files. The figure below shows the dynamics of the traded prices (red points) and the quoted prices at the higher order book levels.

orderbook_trades <- orderbook |>
  filter(type==4|type==5) |>
  select(ts, m_price)

orderbook_quotes <- orderbook |>
    mutate(ts_new = floor_date(ts, "10 seconds")) |>
    group_by(ts_new) |>
    summarise_all(dplyr::last) |>
    select(-ts_new) |>
    mutate(id = row_number()) |>
    select(ts, id, matches("bid|ask")) |>
    gather(level, price, -ts, -id) |>
    separate(level, into=c("side", "variable", "level"), sep="_") |>
    mutate(level = as.numeric(level))  |>
    pivot_wider(names_from = variable, values_from = price)

ggplot() +
  theme_bw() +
  geom_point(data = orderbook_quotes, 
             aes(x=ts, y=price, color=as_factor(level), size = size/max(size)), alpha = 0.1) +
  geom_step(data = orderbook_trades, aes(x=ts, y=m_price), linewidth = 0.1) +
  labs(title="Orderbook Dynamics",
       y= "Price",
       x="") +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        legend.position ="none") +

Next, I compute the midquote, bid-ask spread, aggregate volume, and depth (the amount of trade-able units in the order book).

orderbook_summaries <- orderbook |>
    midquote = ask_price_1 / 2 + bid_price_1 / 2,
    trading_volume = if_else(type == 4 | type == 5, m_price * m_size, as.double(NA)),
    depth_bid = bid_size_1,
    depth_ask = ask_size_1,
    spread = 10000 * (ask_price_1 - bid_price_1) / midquote
  ) |>
    ts_latency = as.numeric(lead(ts)) - as.numeric(ts), # time between messages
    ts_latency = if_else(, 0, ts_latency), # 0 for first message
    ts_minute = floor_date(ts, "30 minutes")
  ) |>
  group_by(ts_minute) |>
    midquote = last(midquote),
    n_trades = sum(!,
    n = n(), # number of messages
    trading_volume = sum(trading_volume, na.rm = TRUE),
    trading_volume = if_else(, 0, trading_volume),
    depth0_bid = weighted.mean(depth_bid, ts_latency),
    depth0_ask = weighted.mean(depth_ask, ts_latency),
    spread = weighted.mean(spread, ts_latency)

orderbook_summaries |> knitr::kable()

