iStarPostTrade | R Documentation |
The model is a cost allocation method to quantify the market impact of financial transactions, depending on an agent order size relative to the market volume; in the authors words is theoretically based on the supply-demand principle, although it may be rather difficult to express ourselves precisely in these terms and even so our interpretations may differ by the several possible scenarios that take place into the market in response to imbalances.
iStarPostTrade(
MktData,
sessions = NULL,
yrBizdays = 250,
horizon = 30,
xtsfy = FALSE,
grouping = FALSE,
groupsBounds,
minGroupDps,
paramsBounds,
paramsInit,
OrdData = NULL,
...
)
MktData |
A list of |
sessions |
A character or a vector of character representing ISO time subsets to split each trading day in "sessions". If not specified, sessions will be assumed to be on a daily basis |
yrBizdays |
A numeric value, the number of business days in a given year data refers to. Default is 250 days |
horizon |
A numeric value, the number of sessions to compute the rolling variables over. Default is 30. See 'Details' |
xtsfy |
A boolean specifying whether the rolling variables computed should become |
grouping |
A boolean or vector of booleans to specifying whether to group datapoints. Eventually, the second element specifies whether to average group values. Attention: the grouping may discard data. See 'Details' |
groupsBounds |
A vector with named elements being 'ImSize', 'POV', 'Vol'. They have to be increasing sequences expressing the respective variable bounds, which are used to build datapoints groups. See 'Details' |
minGroupDps |
A numeric value, the minimum number of datapoints a group should have to be included in the estimation process. Default is 25. See 'Details' |
paramsBounds |
A matrix providing model parameters bounds to pass to |
paramsInit |
A list providing model paramaters initial values to pass to |
OrdData |
A |
... |
Any other passthrough parameter |
Theoretically the I-Star model can be estimated using private order data for which one intends to estimate the impact costs. The main limitations of this approach are: on one hand the lack of data and the effect of neglecting the effect of wider market movements than the ones of the single security on which the order was placed, on the other it may include potential opportunistic trading biases. Based on these considerations we follow Kissell's main discussion line, focusing on the use of market "tic data" and derived quantities that represent proxies of the corresponding order-related variables.
The MktData
input dataset must be a list, with items being market data
by security considered. These items must be named to match the security they refer to.
Each item is required to be an xts
object having at least 'MktPrice' and
'MktQty' columns. For theoretical accuracy of the arrival price it is recommended
to input 'Bid' and 'Ask' columns as well. Similarly, providing a 'Reason' column
allows to have trades classified by your preferred criterion; when this data is
not available the Lee-Ready tick test will be used to infer the trade direction.
If the MktData
list provided has items with different number of observations,
then data considered will be only until to match the item with the smallest number
of observations. Also, beware that to avoid strict restrictions on potentially
mismatching intraday timestamps there is no timestamps complete matching, therefore:
provide a dataset with securities included observed on the same number of unique
days, consistently across the full dataset.
Our best suggestion is to use a data set within the same timeframe and including
the same number of days for each security involved in the analysis.
The horizon
should be chosen according to the number of sessions
a trading day is splitted into.
Parameters groupsBounds
and minGroupDps
regulate the grouping process.
minGroupDps
of each group has to be reached in order to let its datapoints
be included in the estimation process. It dafaults to 25 datapoints, as suggested
by the author. However, this appears to be a rule of thumb, as the parameter largerly
depends on the given original dataset and on others parameters such as the sessions
and horizon
specifications.
groupsBounds
defaults to the following sequences:
Imbalance Size | 0.005, 0.01, 0.02, ..., 0.3 |
Annualized volatility | 0.1, 0.2, ..., 0.8 |
POV | 0.01, 0.05, 0.1, ..., 0.65 |
Where each interval is considered to be left opened and right closed. Again, these values are suggested by the author and appear to come from empirical findings.
For the estimation we use nls
, specifying the algorithm = 'port'
in order to implement the constrained problem the author proposes.
Parameters starting values are provided with paramsInit
, if missing they
are chosen to be their respective lower bound. Note that specified values must
be included in the corresponding paramsBounds
.
If missing, default values for the bounds are:
100 <= a_1 <= 1000 |
0.1 <= a_2 <= 1 |
0.1 <= a_3 <= 1 |
0.1 <= a_4 <= 1 |
0.7 <= b_1 <= 1 |
Note that by definition 0 <= b_1 <= 1
, however the author reports using
0.7
as an empirical value. Nonetheless, the user if left free to specify desired
parameters bounds via paramBounds
, where the rows must follow a_1, a_2,
a_3, a_4 and b_1 order or be named accordingly.
OrdData
can be a data.frame
or list
. When it is a data.frame
,
OrdData
columns are required to be: 'Side', a numeric value being 1 ("buy")
or -1 ("sell"); 'Size', the order size expressed in terms of , that is the ratio
between the total number of traded units and that the ADV on the day the order
was traded ; 'ArrPrice', a numeric value expressing the price of the traded
security (for theoretical accuracy it is recommended to use the corresponding
bid-ask spreads midpoint); 'AvgExecPrice', specifying the average execution
price over the order lifetime; the 'POV' of and the 'AnnualVol', the order
percentage of volume and annualized volatility respectively.
Whereas, when OrdData
is a list
it has to contain two named elements:
'Order.Data', a data.frame
with the same characteristics as above and
'Params', a vector consisting of named elements being the paramaters to use in
the I-Star equations to compute the impact costs and the error measures.
This is useful in cases one already has estimated parameters for the model or
simply wants to see what I-Star model values would look like with different
paramaters, perhaps those coming from the sensitivity analysis carried with
iStarSensitivity
.
TODO: stock specific analysis is a WIP (it shouldn't be hard to integrate in function flow already in place, see it in light of further analyses such as error analysis. Also for testing purposes other kind of data such as market capitalization is needed)
A list whose elements depends on the chosen grouping
and the
usage of OrdData
. It can contain:
'Rolling.Variables'
: A list
whose elements are 'ADV', Annual.Vol', 'Arrival.Cost', 'Imb', 'Imb.Size', 'Imb.Side', 'POV' and 'VWAP' computed depending on the original MktData
dataset provided and over specified horizon
and sessions
'Groups.Buckets'
: A data.frame
providing the per-group imbalance size, percentage of volume and annualized volatility bounds built from provided sequences
'Rolling.Variables.Groups'
: A list
of groups compositions, by securities and their respective 'Rolling.Variables' indices
'Rolling.Variables.Samples'
: A list
of groups compositions, by securities and their respective 'Rolling.Variables' values
'Regression.Variables'
: A data.frame
consisting of the nonlinear regression model data
'nls.impact.fit'
: The nls
object resulting from the nonlinear model being fitted on 'Regression.Variables'
'iStar.Impact.Estimates'
: A data.frame
with I-Star model impact estimates, error measures and orders arrival cost for comparison
In its most general setting, the model is based on market "tic data" only. It is difficult to relate Kissell's provided notion of "tic data" with respect to current data provision standards, which in turn may also vary by data vendors. Here should suffice to mention that an ideal market intraday dataset to input into the model includes trades prices and volumes, "bid" and "ask" prices in order to compute the spreads and possibly the so called "reason" (i.e, the classification of trades as "bid" or "ask"); for each security involved in the analysis.
All the historical variables needed the model are computed internally from
market data and most of them are "rolling end-of-day quantities", meaning that
they are based on previous variables over a specified horizon
(t = 1,...,T
) that rolls one step ahead until data available allows.
Some variables are annualized and hence need the total number of business days
in a given market and within a given year (typically a factor of 252 days, in
the US markets, or of 250 days), we denote it T_{m}
.
These and other quantities involved are defined below:
Ideally is the first bid-ask spreads midpoint. When missing spread data, the first daily market price is used as a proxy.
Is the standard deviation of the close-to-close security returns, scaled on the number of business days in a given year:
\sigma = \sqrt{\frac{T_{m}}{T - 1} . \sum_{t = 2}^{T}{(r_{i} - r_{avg})^{2}}}
It is expressed in decimal units.
Over the specified horizon:
ADV = \frac{1}{T} . \sum_{t}^{T} V_{t}
It is calculated from "buy initiated trades" and "sell initiated trades". When trade 'Reason' is already available there is no need to explicitly infere trades direction. In cases such a 'Reason' is missing, the Lee-Ready tick test will be used to infere trading direction. In its essence, the test is based on determining the sign of price changes: uptick or zero-uptick trades are considered "buy initiated", whereas downtick or zero-downtick tradesare counted as "sell initiated". We express it as:
Q = |\sum{Buy initiated trades volume} - \sum{Sell initiated trades volume}|
To note is that, as the "reason" refers to each trade, "buy initiated trades" and "sell initiated trades" can only be deduced from intraday data and then taken to a daily scale.
It is defined as the ratio:
\frac{Q_{t}}{ADV}
It is expressed on a daily basis and the values are in decimal units. In the I-Star modeling context it represents a proxy of a private agent order size.
It is the signed imbalance and it indicates which side of the market is prevailing. Either +1 or -1 indicating respectively prevailing buy or sell initiated trades.
The ratio between market imbalance and the market daily volume traded over a given day:
\frac{Q_{t}}{V_{t}}
Expressed as
VWAP = \frac{\sum{P_{t}Q_{t}}}{\sum{Q_{t}}}
it is commonly used as a proxy of fair market price. In the present context is specifically used as a proxy of the average execution price.
The usual arrival cost benchmark metric. In a single security analysis framework it refers to the arrival cost of private order transactions, whereas with respect to the full model with market tic data only is an analogous metric based on the VWAP as proxy of a fair average execution price:
Arrival Cost = ln(\frac{VWAP}{P_{0}}) . Imbalance Side . 10^{4}
We start from calculating the total cost of transacting the entire order and then distribute this quantity within single trade periods that took place. Also, with respect to each trade period impact we can distinguish between a temporary and a permanent market impact (Lee-Ready, 1991).
The I-Star model is made of three main components, all expressed in basis points:
Instantaneous impact (I) It is the theoretical impact of executing the entire order at once. We express it here in its "power" functional form, suggested by the author as the empirically most robust, stable and accurate over time one with respect to linear and non-linear alternatives:
I = a_1 . (\frac{Q}{ADV})^{a_2} . \sigma^{a_{3}}
where the parameter a_1
is the sensitivity to trade size, a_2
is the order shape parameter and a_3
the volatility shape parameter.
Market impact (MI) It represents the period-by-period impact cost due to a given trading strategy and is expressed as:
MI = b_1 . POV^{a_4} . I + (1 - b_1) . I
where a_4
is said POV shape parameter and b_1
is the
percentage of total temporary market impact.
Timing risk measure It is a proxy for the uncertainty surrounding the cost estimate
TR = \sigma . \sqrt{\frac{S . (1 - POV)}{3 . T_{m} . ADV . POV}} * 10^{4}
where S
is the private order size.
The first two equations are part of the model estimation, whereas the last one is used as a measure of risk esposure for a given order.
TODO: add outliers criteria (consistency still under discussion)
The grouping may be carried before procedeeding with the non-linear regression estimation. The grouping is based on buckets built with respect to three variables: the Imbalance size, the POV and the annualized volatility. It is irrespective of the security whose values fall into the buckets. A datapoints threshold in each bucket has to be reached in order to include the corresponding group in the estimation process.
Several aspects are worth empashizing. First of all, using Kissell's words "too fine increments [lead to] excessive groupings surface and we have found that a substantially large data grouping does not always uncover a statistical relationship between cost and our set of explanatory factors." This in turn points to an important consideration: also depending on the datapoints threshold specified, the data grouping may result in discarding data and this allows to exclude anomalous observations (outliers) with respect to the explanatory variables. On one hand is therefore understood how this step offers improvement margins to the nonlinear least squares estimation procedure, on the other it may cause convergence issues dependending on the effective shrinkage datapoints go through.
The author suggests three methods to estimate model paramaters from the instantaneous and the market impact equations.
Not implemented at present.
Not implemented at present.
The full model parameters are estimated by means of nonlinear least squares. There is a wide theory behind such approach, rich of pros and contra inherent to the specific iterative procedure used and their peculiarities in achieving converge. The interested reader may consult Venables and Ripley (2002). A general warning in estimating this model comes from the author himself: "Analysts choosing to solve the parameters of the model via non-linear regression of the full model need to thoroughly understand the repercussions of non-linear regression analysis as well as the sensitivity of the parameters, and potential solution ranges for the parameters."
In his modeling context the author sets a constrained problem providing bounds on parameters, in order to ensure feasible estimated values. The author's suggested bounds are implemented by default to follow his methodology, as reported in 'Details'. However, the opportunity to provide bounds is supported and left to the users. Likewise, initial parameters values to start the iterative constrained minimization problem resolution from is left to the user: to my knowledge at the time of writing, the author does not provide any specific clue in the estimation procedure used and especially there is no suggestion on particular starting values to begin with. It is valuable for a user to control starting values, as a way to check whether the estimated parameters come form a local optimum or if a global optimum may have been reasonably achieved.
Once the parameters have been estimated, the I-Star best fit equations provide impact costs estimates for a given market parent order specified by its size, POV, annualized volatility, side and arrival price. The instantaneous, market impacts (both temporary and permanent) and timing risk are described by the I-Star model equations explained above. The cost error is assessed as the difference between the arrival cost of the order and the market impact estimate. The z-score is a "risk-adjusted error" and is expressed as the ratio between the cost error and timining risk. The author reports that most accurate models possess z-scores distributions with mean zero and unit variance.
Vito Lestingi
The Science of Algorithmic Trading and Portfolio Management (Kissell, 2013), Elsevier Science. A Practical Framework for Estimating Transaction Costs and Developing Optimal Trading Strategies to Achieve Best Execution (Kissell, Glantz and Malamut, 2004), Finance Research Letters. Inferring Trade Direction from Intraday Data (Lee and Ready, 1991), The Journal of Finance. Modern Applied Statistics with S (Venables and Ripley, 2002), Springer.
Return.calculate
,
sd.annualized
,
nls
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.