iStarPostTrade: The Kissell-Malamut _I-Star_ Market Impact model
In braverock/blotter: Tools for Transaction-Oriented Trading Systems P&L

iStarPostTrade

R Documentation

The Kissell-Malamut I-Star Market Impact model

Description

The model is a cost allocation method to quantify the market impact of financial transactions, depending on an agent order size relative to the market volume; in the authors words is theoretically based on the supply-demand principle, although it may be rather difficult to express ourselves precisely in these terms and even so our interpretations may differ by the several possible scenarios that take place into the market in response to imbalances.

Usage

iStarPostTrade(
  MktData,
  sessions = NULL,
  yrBizdays = 250,
  horizon = 30,
  xtsfy = FALSE,
  grouping = FALSE,
  groupsBounds,
  minGroupDps,
  paramsBounds,
  paramsInit,
  OrdData = NULL,
  ...
)

Arguments

`MktData`	A list of `xts` objects, each representing a security market data. See 'Details'
`sessions`	A character or a vector of character representing ISO time subsets to split each trading day in "sessions". If not specified, sessions will be assumed to be on a daily basis
`yrBizdays`	A numeric value, the number of business days in a given year data refers to. Default is 250 days
`horizon`	A numeric value, the number of sessions to compute the rolling variables over. Default is 30. See 'Details'
`xtsfy`	A boolean specifying whether the rolling variables computed should become `xts` object with consistent dates
`grouping`	A boolean or vector of booleans to specifying whether to group datapoints. Eventually, the second element specifies whether to average group values. Attention: the grouping may discard data. See 'Details'
`groupsBounds`	A vector with named elements being 'ImSize', 'POV', 'Vol'. They have to be increasing sequences expressing the respective variable bounds, which are used to build datapoints groups. See 'Details'
`minGroupDps`	A numeric value, the minimum number of datapoints a group should have to be included in the estimation process. Default is 25. See 'Details'
`paramsBounds`	A matrix providing model parameters bounds to pass to `nls`. Parameters are considered by row and columns represents lower and upper bounds, respectively. See 'Details'
`paramsInit`	A list providing model paramaters initial values to pass to `nls`. Elements should be named with the corresponding parameter, i.e. 'a_1', 'a_2', 'a_3', 'a_4' and 'b_1'. See 'Details'
`OrdData`	A `data.frame` providing custom order data specifics to estimate the impacts for, with required columns 'Side', 'Size', 'ArrPrice', 'AvgExecPrice', 'POV' and 'AnnualVol'. Or a `list` consisting of 'Order.Data' and 'Params' items. See 'Details'
`...`	Any other passthrough parameter

Details

Theoretically the I-Star model can be estimated using private order data for which one intends to estimate the impact costs. The main limitations of this approach are: on one hand the lack of data and the effect of neglecting the effect of wider market movements than the ones of the single security on which the order was placed, on the other it may include potential opportunistic trading biases. Based on these considerations we follow Kissell's main discussion line, focusing on the use of market "tic data" and derived quantities that represent proxies of the corresponding order-related variables.

The MktData input dataset must be a list, with items being market data by security considered. These items must be named to match the security they refer to. Each item is required to be an xts object having at least 'MktPrice' and 'MktQty' columns. For theoretical accuracy of the arrival price it is recommended to input 'Bid' and 'Ask' columns as well. Similarly, providing a 'Reason' column allows to have trades classified by your preferred criterion; when this data is not available the Lee-Ready tick test will be used to infer the trade direction. If the MktData list provided has items with different number of observations, then data considered will be only until to match the item with the smallest number of observations. Also, beware that to avoid strict restrictions on potentially mismatching intraday timestamps there is no timestamps complete matching, therefore: provide a dataset with securities included observed on the same number of unique days, consistently across the full dataset. Our best suggestion is to use a data set within the same timeframe and including the same number of days for each security involved in the analysis.

The horizon should be chosen according to the number of sessions a trading day is splitted into.

Parameters groupsBounds and minGroupDps regulate the grouping process. minGroupDps of each group has to be reached in order to let its datapoints be included in the estimation process. It dafaults to 25 datapoints, as suggested by the author. However, this appears to be a rule of thumb, as the parameter largerly depends on the given original dataset and on others parameters such as the sessions and horizon specifications. groupsBounds defaults to the following sequences:

Imbalance Size	0.005, 0.01, 0.02, ..., 0.3
Annualized volatility	0.1, 0.2, ..., 0.8
POV	0.01, 0.05, 0.1, ..., 0.65

Where each interval is considered to be left opened and right closed. Again, these values are suggested by the author and appear to come from empirical findings.

For the estimation we use nls, specifying the algorithm = 'port' in order to implement the constrained problem the author proposes. Parameters starting values are provided with paramsInit, if missing they are chosen to be their respective lower bound. Note that specified values must be included in the corresponding paramsBounds. If missing, default values for the bounds are:

100 <= a_1 <= 1000

0.1 <= a_2 <= 1

0.1 <= a_3 <= 1

0.1 <= a_4 <= 1

0.7 <= b_1 <= 1

Note that by definition 0 <= b_1 <= 1, however the author reports using 0.7 as an empirical value. Nonetheless, the user if left free to specify desired parameters bounds via paramBounds, where the rows must follow a_1, a_2, a_3, a_4 and b_1 order or be named accordingly.

OrdData can be a data.frame or list. When it is a data.frame, OrdData columns are required to be: 'Side', a numeric value being 1 ("buy") or -1 ("sell"); 'Size', the order size expressed in terms of , that is the ratio between the total number of traded units and that the ADV on the day the order was traded ; 'ArrPrice', a numeric value expressing the price of the traded security (for theoretical accuracy it is recommended to use the corresponding bid-ask spreads midpoint); 'AvgExecPrice', specifying the average execution price over the order lifetime; the 'POV' of and the 'AnnualVol', the order percentage of volume and annualized volatility respectively. Whereas, when OrdData is a list it has to contain two named elements: 'Order.Data', a data.frame with the same characteristics as above and 'Params', a vector consisting of named elements being the paramaters to use in the I-Star equations to compute the impact costs and the error measures. This is useful in cases one already has estimated parameters for the model or simply wants to see what I-Star model values would look like with different paramaters, perhaps those coming from the sensitivity analysis carried with iStarSensitivity.

TODO: stock specific analysis is a WIP (it shouldn't be hard to integrate in function flow already in place, see it in light of further analyses such as error analysis. Also for testing purposes other kind of data such as market capitalization is needed)

Value

A list whose elements depends on the chosen grouping and the usage of OrdData. It can contain:

'Rolling.Variables':: A list whose elements are 'ADV', Annual.Vol', 'Arrival.Cost', 'Imb', 'Imb.Size', 'Imb.Side', 'POV' and 'VWAP' computed depending on the original MktData dataset provided and over specified horizon and sessions
'Groups.Buckets':: A data.frame providing the per-group imbalance size, percentage of volume and annualized volatility bounds built from provided sequences
'Rolling.Variables.Groups':: A list of groups compositions, by securities and their respective 'Rolling.Variables' indices
'Rolling.Variables.Samples':: A list of groups compositions, by securities and their respective 'Rolling.Variables' values
'Regression.Variables':: A data.frame consisting of the nonlinear regression model data
'nls.impact.fit':: The nls object resulting from the nonlinear model being fitted on 'Regression.Variables'
'iStar.Impact.Estimates':: A data.frame with I-Star model impact estimates, error measures and orders arrival cost for comparison

Market "tic data" and variables

In its most general setting, the model is based on market "tic data" only. It is difficult to relate Kissell's provided notion of "tic data" with respect to current data provision standards, which in turn may also vary by data vendors. Here should suffice to mention that an ideal market intraday dataset to input into the model includes trades prices and volumes, "bid" and "ask" prices in order to compute the spreads and possibly the so called "reason" (i.e, the classification of trades as "bid" or "ask"); for each security involved in the analysis.

All the historical variables needed the model are computed internally from market data and most of them are "rolling end-of-day quantities", meaning that they are based on previous variables over a specified horizon (t = 1,...,T) that rolls one step ahead until data available allows. Some variables are annualized and hence need the total number of business days in a given market and within a given year (typically a factor of 252 days, in the US markets, or of 250 days), we denote it T_{m}. These and other quantities involved are defined below:

Arrival Price.

Ideally is the first bid-ask spreads midpoint. When missing spread data, the first daily market price is used as a proxy.

Annualized volatility.

Is the standard deviation of the close-to-close security returns, scaled on the number of business days in a given year:

\sigma = \sqrt{\frac{T_{m}}{T - 1} . \sum_{t = 2}^{T}{(r_{i} - r_{avg})^{2}}}

It is expressed in decimal units.

Average Daily Volume (ADV).

Over the specified horizon:

ADV = \frac{1}{T} . \sum_{t}^{T} V_{t}

Imbalance (Q).

It is calculated from "buy initiated trades" and "sell initiated trades". When trade 'Reason' is already available there is no need to explicitly infere trades direction. In cases such a 'Reason' is missing, the Lee-Ready tick test will be used to infere trading direction. In its essence, the test is based on determining the sign of price changes: uptick or zero-uptick trades are considered "buy initiated", whereas downtick or zero-downtick tradesare counted as "sell initiated". We express it as:

Q = |\sum{Buy initiated trades volume} - \sum{Sell initiated trades volume}|

To note is that, as the "reason" refers to each trade, "buy initiated trades" and "sell initiated trades" can only be deduced from intraday data and then taken to a daily scale.

Imbalance size.

It is defined as the ratio:

\frac{Q_{t}}{ADV}

It is expressed on a daily basis and the values are in decimal units. In the I-Star modeling context it represents a proxy of a private agent order size.

Imbalance side.

It is the signed imbalance and it indicates which side of the market is prevailing. Either +1 or -1 indicating respectively prevailing buy or sell initiated trades.

Percentage of volume (POV).

The ratio between market imbalance and the market daily volume traded over a given day:

\frac{Q_{t}}{V_{t}}

Volume Weighted Average Price (VWAP).

Expressed as

VWAP = \frac{\sum{P_{t}Q_{t}}}{\sum{Q_{t}}}

it is commonly used as a proxy of fair market price. In the present context is specifically used as a proxy of the average execution price.

Arrival Cost.

The usual arrival cost benchmark metric. In a single security analysis framework it refers to the arrival cost of private order transactions, whereas with respect to the full model with market tic data only is an analogous metric based on the VWAP as proxy of a fair average execution price:

Arrival Cost = ln(\frac{VWAP}{P_{0}}) . Imbalance Side . 10^{4}

The I-Star model equations

We start from calculating the total cost of transacting the entire order and then distribute this quantity within single trade periods that took place. Also, with respect to each trade period impact we can distinguish between a temporary and a permanent market impact (Lee-Ready, 1991).

The I-Star model is made of three main components, all expressed in basis points:

Instantaneous impact (I) It is the theoretical impact of executing the entire order at once. We express it here in its "power" functional form, suggested by the author as the empirically most robust, stable and accurate over time one with respect to linear and non-linear alternatives:

I = a_1 . (\frac{Q}{ADV})^{a_2} . \sigma^{a_{3}}

where the parameter a_1 is the sensitivity to trade size, a_2 is the order shape parameter and a_3 the volatility shape parameter.
Market impact (MI) It represents the period-by-period impact cost due to a given trading strategy and is expressed as:

MI = b_1 . POV^{a_4} . I + (1 - b_1) . I

where a_4 is said POV shape parameter and b_1 is the percentage of total temporary market impact.
Timing risk measure It is a proxy for the uncertainty surrounding the cost estimate

TR = \sigma . \sqrt{\frac{S . (1 - POV)}{3 . T_{m} . ADV . POV}} * 10^{4}

where S is the private order size.

The first two equations are part of the model estimation, whereas the last one is used as a measure of risk esposure for a given order.

Outliers analysis

TODO: add outliers criteria (consistency still under discussion)

Data grouping procedure

The grouping may be carried before procedeeding with the non-linear regression estimation. The grouping is based on buckets built with respect to three variables: the Imbalance size, the POV and the annualized volatility. It is irrespective of the security whose values fall into the buckets. A datapoints threshold in each bucket has to be reached in order to include the corresponding group in the estimation process.

Several aspects are worth empashizing. First of all, using Kissell's words "too fine increments [lead to] excessive groupings surface and we have found that a substantially large data grouping does not always uncover a statistical relationship between cost and our set of explanatory factors." This in turn points to an important consideration: also depending on the datapoints threshold specified, the data grouping may result in discarding data and this allows to exclude anomalous observations (outliers) with respect to the explanatory variables. On one hand is therefore understood how this step offers improvement margins to the nonlinear least squares estimation procedure, on the other it may cause convergence issues dependending on the effective shrinkage datapoints go through.

Parameters estimation

The author suggests three methods to estimate model paramaters from the instantaneous and the market impact equations.

Two-step process:

Not implemented at present.

Guesstimate technique:

Not implemented at present.

Nonlinear regression:

The full model parameters are estimated by means of nonlinear least squares. There is a wide theory behind such approach, rich of pros and contra inherent to the specific iterative procedure used and their peculiarities in achieving converge. The interested reader may consult Venables and Ripley (2002). A general warning in estimating this model comes from the author himself: "Analysts choosing to solve the parameters of the model via non-linear regression of the full model need to thoroughly understand the repercussions of non-linear regression analysis as well as the sensitivity of the parameters, and potential solution ranges for the parameters."

In his modeling context the author sets a constrained problem providing bounds on parameters, in order to ensure feasible estimated values. The author's suggested bounds are implemented by default to follow his methodology, as reported in 'Details'. However, the opportunity to provide bounds is supported and left to the users. Likewise, initial parameters values to start the iterative constrained minimization problem resolution from is left to the user: to my knowledge at the time of writing, the author does not provide any specific clue in the estimation procedure used and especially there is no suggestion on particular starting values to begin with. It is valuable for a user to control starting values, as a way to check whether the estimated parameters come form a local optimum or if a global optimum may have been reasonably achieved.

Impact estimates, error and sensitivity analyses

Once the parameters have been estimated, the I-Star best fit equations provide impact costs estimates for a given market parent order specified by its size, POV, annualized volatility, side and arrival price. The instantaneous, market impacts (both temporary and permanent) and timing risk are described by the I-Star model equations explained above. The cost error is assessed as the difference between the arrival cost of the order and the market impact estimate. The z-score is a "risk-adjusted error" and is expressed as the ratio between the cost error and timining risk. The author reports that most accurate models possess z-scores distributions with mean zero and unit variance.

Author(s)

Vito Lestingi

References

The Science of Algorithmic Trading and Portfolio Management (Kissell, 2013), Elsevier Science. A Practical Framework for Estimating Transaction Costs and Developing Optimal Trading Strategies to Achieve Best Execution (Kissell, Glantz and Malamut, 2004), Finance Research Letters. Inferring Trade Direction from Intraday Data (Lee and Ready, 1991), The Journal of Finance. Modern Applied Statistics with S (Venables and Ripley, 2002), Springer.

braverock/blotter
Tools for Transaction-Oriented Trading Systems P&L

iStarPostTrade: The Kissell-Malamut _I-Star_ Market Impact model
In braverock/blotter: Tools for Transaction-Oriented Trading Systems P&L

The Kissell-Malamut I-Star Market Impact model

Description

Usage

Arguments

Details

Value

Market "tic data" and variables

The I-Star model equations

Outliers analysis

Data grouping procedure

Parameters estimation

Impact estimates, error and sensitivity analyses

Author(s)

References

See Also

Related to iStarPostTrade in braverock/blotter...

R Package Documentation

Browse R Packages

We want your feedback!

braverock/blotter Tools for Transaction-Oriented Trading Systems P&L

iStarPostTrade: The Kissell-Malamut _I-Star_ Market Impact model In braverock/blotter: Tools for Transaction-Oriented Trading Systems P&L

The Kissell-Malamut I-Star Market Impact model

Description

Usage

Arguments

Details

Value

Market "tic data" and variables

The I-Star model equations

Outliers analysis

Data grouping procedure

Parameters estimation

Impact estimates, error and sensitivity analyses

Author(s)

References

See Also

Related to iStarPostTrade in braverock/blotter...

R Package Documentation

Browse R Packages

We want your feedback!

braverock/blotter
Tools for Transaction-Oriented Trading Systems P&L

iStarPostTrade: The Kissell-Malamut _I-Star_ Market Impact model
In braverock/blotter: Tools for Transaction-Oriented Trading Systems P&L