trade_classification | R Documentation |
classify_trades()
classifies high-frequency trading data into
buyer-initiated and seller-initiated trades using different algorithms, and
different time lags.
aggregate_trades()
aggregates high-frequency trading data into aggregated
data for provided frequency of aggregation. The aggregation is preceded by
a trade classification step which classifies trades using different trade
classification algorithms and time lags.
classify_trades(data, algorithm = "Tick", timelag = 0, ..., verbose = TRUE)
aggregate_trades(
data,
algorithm = "Tick",
timelag = 0,
frequency = "day",
unit = 1,
...,
verbose = TRUE
)
data |
A dataframe with 4 variables in the following
order ( |
algorithm |
A character string refers to the algorithm used
to determine the trade initiator, a buyer or a seller. It takes one of four
values ( |
timelag |
A number referring to the time lag in milliseconds
used to calculate the lagged midquote, bid and ask for the algorithms
|
... |
Additional arguments passed on to the functions
|
verbose |
A binary variable that determines whether detailed
information about the progress of the trade classification is displayed.
No output is produced when |
frequency |
The frequency used to aggregate intraday data. It takes one
of the following values: |
unit |
An integer referring to the size of the aggregation window
used to aggregate intraday data. The default value is |
The argument algorithm
takes one of four values:
"Tick"
refers to the tick algorithm: Trade is classified as a
buy (sell) if the price of the trade to be classified
is above (below) the closest different price of a previous trade.
"Quote"
refers to the quote algorithm: it classifies a
trade as a buy (sell) if the trade price of the trade to be
classified is above (below) the mid-point of the bid and ask spread.
Trades executed at the mid-spread are not classified.
"LR"
refers to LR
algorithm as in
\insertCiteLeeReady1991;textualPINstimation. It classifies a trade
as a buy (sell) if its price is above (below) the mid-spread (quote
algorithm), and uses the tick algorithm if the trade price is at
the mid-spread.
"EMO"
refers to EMO
algorithm as in
\insertCiteEllis2000;textualPINstimation.
It classifies trades at the bid (ask) as sells (buys) and uses the tick
algorithm to classify trades within the then prevailing bid-ask spread.
LR
recommend the use of mid-spread five-seconds earlier ('5-second'
rule) mitigating trade misclassifications for many of the 150
NYSE stocks they analyze. On the other hand, in more recent studies such
as \insertCitepiwowar2006;textualPINstimation and
\insertCiteAktas2014;textualPINstimation, the use of
1-second lagged midquotes are shown to yield lower rates of
misclassifications. The default value is set to 0
seconds (no time-lag).
Considering the ultra-fast nature of today’s financial markets, time-lag
is in the unit of milliseconds. Shorter than 1-second lags can also be
implemented by entering values such as 100
or 500
.
The function classify_trades() returns a dataframe of five variables. The
first four variables are obtained from the argument data
: timestamp
,
price
, bid
, ask
. The fifth variable is isbuy
, which takes the value
TRUE
, when the trade is classified as a buyer-initiated trade, and FALSE
when the trade is classified as a seller-initiated trade.
The function aggregate_trades() returns a dataframe of two
(or three) variables. If fullreport
is set to TRUE
, then
the returned dataframe has three variables {freq, b, s}
. If
fullreport
is set to FALSE
, then the returned dataframe has
two variables {b, s}
, and, therefore, can be #'directly used for the
estimation of the PIN
and MPIN
models.
# There is a preloaded dataset called 'hfdata' contained in the package.
# It is an artificially created high-frequency trading data. The dataset
# contains 100 000 trades and five variables 'timestamp', 'price',
# 'volume', 'bid', and 'ask'. For more information, type ?hfdata.
xdata <- hfdata
xdata$volume <- NULL
# Use the EMO algorithm with a timelag of 500 milliseconds to classify
# high-frequency trades in the dataset 'xdata'
ctrades <- classify_trades(xdata, algorithm = "EMO", timelag = 500, verbose = FALSE)
# Use the LR algorithm with a timelag of 1 second to aggregate intraday data
# in the dataset 'xdata' at a frequency of 15 minutes.
lrtrades <- aggregate_trades(xdata, algorithm = "LR", timelag = 1000,
frequency = "min", unit = 15, verbose = FALSE)
# Use the Quote algorithm with a timelag of 1 second to aggregate intraday data
# in the dataset 'xdata' at a daily frequency.
qtrades <- aggregate_trades(xdata, algorithm = "Quote", timelag = 1000,
frequency = "day", unit = 1, verbose = FALSE)
# Since the argument 'fullreport' is set to FALSE by default, then the
# output 'qtrades' can be used directly for the estimation of the PIN
# model, namely using pin_ea().
estimate <- pin_ea(qtrades, verbose = FALSE)
# Show the estimate
show(estimate)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.