benchTradeStats: Trade Execution Performance Benchmarks Statistical Testing
In braverock/blotter: Tools for Transaction-Oriented Trading Systems P&L

benchTradeStats

R Documentation

Trade Execution Performance Benchmarks Statistical Testing

Description

The main scope of the function is to gather statistical tests that are carried on benchmarked trades performance, as obtained via benchTradePerf(). When testing trading strategies on Symbols, assessing whether there is a statistical significance difference in their performance is of interest. In other words, the goal is determining which given strategy outperformed the other or if they statistically bear the same results in terms of performance. All the statistical test included are non-parametric tests, that is distribution-free tests. These tests allow great flexibility, but in turn require that data verifies some assumptions in order for their results to be meaningful.

Usage

benchTradeStats(
  Portfolio,
  benchmark,
  side,
  type,
  metric,
  POV,
  OrdersMktData,
  approach = c("paired", "independent"),
  test = c("Sign", "Wilcoxon", "Median", "WMW"),
  dgptest = c("ChiSq", "KS"),
  conf.level,
  alternative
)

Arguments

`Portfolio`	A vector of character strings idenfifying initilized Portfolio objects with the Symbol(s) to test. See 'Details'
`benchmark`	A character string indentifying one of the benchmarks in `benchTradePerf` (unless 'RPM') and in addition to them 'ArrCost' when `approach == 'independent'`. Default depends on specified `approach`. See 'Details'
`side`	A numeric value which indicates the side of the trade Either 1 or -1, `side = 1` (default) means "Buy" and `side = -1` is "Sell"
`type`	A list with named element `price` or `vwap`, of a character string. Relevant only for the corresponding `benchmark = 'MktBench'` and `benchmark = 'VWAP'`. When `benchmark = 'MktBench'`, it is only pasted to the corresponding console output column. It does not influence the PnL metric computation. When `benchmark = 'VWAP'`, it specifies the VWAP benchmark and defaults to `type = list(vwap = 'interval')`. See `benchTradePerf` 'Details'
`metric`	A numeric value, either 1 or -1 meaning "performance metric" or "cost metric", respectively. See 'Notes'
`POV`	A numeric value between 0 and 1, specifying the POV rate for the 'PWP' benchmark
`OrdersMktData`	A list or nested list of `benchTradePerf` compliant `MktData` objects. See 'Details'
`approach`	A character string indentifying the statistical testing approach. Either 'paired' or 'independent'
`test`	A character string indentifying the statistical test to run. If `apprach=='paired'` either 'Sign' or 'Wilcoxon', when `apprach=='independent'` either 'Median' or 'WMW'
`dgptest`	A string identifying the distribution analysis test to run. Either 'ChiSq' or 'KS'
`conf.level`	A numeric value, the confidence level of the interval (see `stats` documentation)
`alternative`	A string identifying the statistical test tail (see `stats` documentation). Not used for `test=='Median'`

Details

There exists a wide range of algorithmic trading strategies and even whithin the same category of strategies many sligthly different versions may exist (they often differ by brokerage firm). In a post-trade analysis framework, starting from trading strategies transactions prices we compare the overall performance of multiple orders, each executed under two different categories, to ultimately test whether data supports a difference in their median. In other words, the basic statistical hypothesis test setting is to test the null hypothesis of a same median against the alternative hypothesis of a different median.

Two statistical testing approach are contemplated, the suitability of the approach ultimately relies on analysts' specific research questions. In turn, each of them critically depends on how transactional data was obtained in the first place and on its distributional properties. First, the paired samples approach, where: trades are two equal-length child orders belonging to the same parent order on a Symbol and each occurred in the same exact timeframe. Following Kissell, the preferred comparison metric against which trades have to be benchmarked is the VWAP benchmark. In this context, tests included are the Wicolxon Signed Rank test and the Sign test. Second, the independent samples approach, where trades can be on different Symbols, that may have occured over different periods and possibly with different frequency; here Kissell suggests the Arrival Cost as the preferred trades benchmark metric. In this context, tests included are the Median test and the Wicolxon-Mann-Withney test.

In addition to the statistical tests above, one may be interested in studying the distribution of the overall performance/cost across the orders in order to assess whether they come from the same distribution or not. Such analysis is said a distribution analysis, which for our purposes reduces to a data generating process (DGP) comparison. Statistical tests implemented in this framework are the Chi-Square goodness-of-fit test and the Kolmogorov-Smirnoff goodness-of-fit test.

In the paired samples approach we seek to test different trading strategies (or brokers) on the same symbol, whereas in the indepented samples approach this may or may not be the case. By and large, because of the way a portfolio object is created and updated with transactions in blotter, a single Symbol is always kept unique to prevent data corruption. Therefore, even when testing strategies on the same symbol, one must nonetheless make the distinction in the first place, by initializing the Symbol with fictitious different names.

Also, note that the market data needed in the number of possible scenarios one may be interested in analyzing is binded to the statistical testing approach. Because of its assumptions, in the paired approach each couple of fictitious symbols share the same market data. Hence, OrdersMktData represents a list with length equal to the number of orders. This is not necessarily true in the independent approach, where there could be cross-sectional analyses purposes over different assets that may have been traded over different periods. In all these cases, the OrdersMktData needed would be much richer in variety as the reference MktData can be completely different in kind and in periods. Other conditions met, one must simply build the input OrdersMktData with copies of the target MktData as needed.

Value

A list whose elements depend on specified parameters.

Bench.Data:: A list whose elements are data.frames of benchTradePerf outputs, one for each Portfolio-Symbol combination, all under the specified parameters
Bench.Test.Data:: A data.frame with statistical testing input data
*.Test.Output:: A "htest" output object of the selected statistical test, except for the 'Median' test
*.Report:: A string with a comment, only for 'Median' test
*.DGP.Report:: A string with a comment, only for 'ChiSq' dgptest

Note

In the independet testing approach, cost metrics are suggested in spite of performance metrics (used in benchTradePerf). The only difference among these kinds of metrics in their sign and thus in values interpretation: positive values of a cost metric entail underperformance of the execution with respect to the benchmark, vice versa negative values indicate overperformance. All the benchmarks can be expressed in cost or perfomance terms and this is what the parameter metric allows to do. More often than not the Arrival Cost is suggested as the preferred metric for benchmarking purposes of transactions under testing. It is, however, nothing else than the Trading PnL performance metric expressed as a cost metric. Hence, using benchmark="TradeBench" and metric = -1 means selecting the arrival cost benchmark. This will happen by default if these parameter are not provided.

For both tests categories, test and dgptest, the same conf.level and alternative are almost always used, if relevant for the use case. Please also note that, as it should be clear from reports, tests 'Median' and 'ChiSq' only allow for a two-sided alternative at the moment, regardless of the alternative input used for the other test. In the specific case test='Median' and dgptest='ChiSq', the function will perform a two-sided test in both cases, regardless of alternative.

Author(s)

Vito Lestingi

References

The Science of Algorithmic Trading and Portfolio Management (Kissell, 2013), ISBN 978-0-12-401689-7. Statistical Methods to Compare Algorithmic Performance (Kissell, 2007), The Journal of Trading.

Examples


library(blotter)
set.seed(333)
.blotter <- new.env()
data(ABC)
ABC.day <- ABC[which(as.Date(index(ABC)) == "2019-02-01"), ]
colnames(ABC.day) <- c('MktPrice', 'MktQty')
# silly MktData subsettings to get data
OrdersMktData <- list(OrdersMktData1 = ABC.day[1:1000], 
                      OrdersMktData2 = ABC.day[(nrow(ABC.day) - 2999):(nrow(ABC.day) - 1999)], 
                      OrdersMktData3 = ABC.day[(nrow(ABC.day) - 999):nrow(ABC.day)])
inds1 <- sample(1:500, 50)
inds2 <- sample(501:1000, 50)
txns.1 <- OrdersMktData$OrdersMktData1[sort(inds1)]; colnames(txns.1) <- c('TxnPrice', 'TxnQty')
txns.2 <- OrdersMktData$OrdersMktData1[sort(inds2)]; colnames(txns.2) <- c('TxnPrice', 'TxnQty')
txns.3 <- OrdersMktData$OrdersMktData2[sort(inds1)]; colnames(txns.3) <- c('TxnPrice', 'TxnQty')
txns.4 <- OrdersMktData$OrdersMktData2[sort(inds2)]; colnames(txns.4) <- c('TxnPrice', 'TxnQty')
txns.5 <- OrdersMktData$OrdersMktData3[sort(inds1)]; colnames(txns.5) <- c('TxnPrice', 'TxnQty')
txns.6 <- OrdersMktData$OrdersMktData3[sort(inds2)]; colnames(txns.6) <- c('TxnPrice', 'TxnQty')
# Build 'orders' as portfolios
ordNames <- c('order.1', 'order.2', 'order.3')
symNames <- c('ABC.A', 'ABC.B')
currency('USD')
stock(symNames[1], currency = 'USD')
stock(symNames[2], currency = 'USD')
initPortf(ordNames[1], symbols = symNames) # Order 1
addTxns('order.1', symNames[1], TxnData = txns.1)
addTxns('order.1', symNames[2], TxnData = txns.2)
initPortf(ordNames[2], symbols = symNames) # Order 2
addTxns('order.2', symNames[1], TxnData = txns.3)
addTxns('order.2', symNames[2], TxnData = txns.4)
initPortf(ordNames[3], symbols = symNames) # Order 3
addTxns(ordNames[3], symNames[1], TxnData = txns.5)
addTxns(ordNames[3], symNames[2], TxnData = txns.6)

## Paired observations approach tests 
# Sign test, VWAP full and VWAP interval
benchTradeStats(Portfolio = ordNames, benchmark = "VWAP", side = 1, 
                type = list(vwap = 'full'), metric = 1,
                OrdersMktData = OrdersMktData, approach = 'paired', 
                test = 'Sign', conf.level = 0.95, alternative = "two.sided")
                
benchTradeStats(Portfolio = ordNames, benchmark = "VWAP", side = 1, 
                type = list(vwap = 'interval'), OrdersMktData = OrdersMktData, 
                approach = 'paired', test = 'Sign', conf.level = 0.95, 
                alternative = "two.sided")
                
# Wilcoxon test, VWAP full and VWAP interval
benchTradeStats(Portfolio = ordNames, benchmark = "VWAP", side = 1, 
                type = list(vwap = 'full'), metric = 1,OrdersMktData = OrdersMktData, 
                approach = 'paired', test = 'Wilcoxon', conf.level = 0.95, 
                alternative = "two.sided") 
                
benchTradeStats(Portfolio = ordNames, benchmark = "VWAP", side = 1, type = list(vwap = 'interval'), 
                OrdersMktData = OrdersMktData, approach = 'paired', test = 'Wilcoxon', 
                conf.level = 0.95, alternative = "two.sided") 

# Sign test, ChiSq test on VWAP interval
benchTradeStats(Portfolio = ordNames, benchmark = "VWAP", side = 1, 
                type = list(vwap = 'interval'), metric = 1, OrdersMktData = OrdersMktData, 
                approach = 'paired', test = 'Sign', dgptest = 'ChiSq', 
                conf.level = 0.95, alternative = "two.sided")

# Sign test and KS test on VWAP interval
benchTradeStats(Portfolio = ordNames, benchmark = "VWAP", side = 1, 
                type = list(vwap = 'interval'), metric = 1,
                OrdersMktData = OrdersMktData, approach = 'paired', test = 'Sign', dgptest = 'KS', 
                conf.level = 0.95, alternative = "two.sided")

## Independent observations approach tests
# silly multiplications to make them differ
OrdersMktDataIndp <- list(list(OrdersMktData$OrdersMktData1, OrdersMktData$OrdersMktData2), 
                          list(OrdersMktData$OrdersMktData1 * 2, OrdersMktData$OrdersMktData2 * 3),
                          list(OrdersMktData$OrdersMktData1 * 4, OrdersMktData$OrdersMktData2 * 5)) 

# Median test, TradeBench
benchTradeStats(Portfolio = ordNames, benchmark = "TradeBench", side = 1, metric = -1, 
                OrdersMktData = OrdersMktDataIndp, approach = 'independent',
                test = 'Median', conf.level = 0.95, alternative = "two.sided")

# Wilcoxon-Mann-Whitney test, TradeBench
benchTradeStats(Portfolio = ordNames, benchmark = "TradeBench", side = 1, metric = -1,
                OrdersMktData = OrdersMktDataIndp, approach = 'independent',
                test = 'WMW', conf.level = 0.95, alternative = "two.sided")

# Median test, ChiSq test on TradeBench (two reports produced)
benchTradeStats(Portfolio = ordNames, benchmark = "TradeBench", side = 1, metric = -1,
                OrdersMktData = OrdersMktDataIndp, approach = 'independent', 
                test = 'Median', dgptest = 'ChiSq', conf.level = 0.95, 
                alternative = "two.sided")

# WMW test and KS test on TradeBench 
benchTradeStats(Portfolio = ordNames, benchmark = "TradeBench", side = 1, metric = -1,
                OrdersMktData = OrdersMktDataIndp, approach = 'independent', 
                test = 'WMW', dgptest = 'KS', conf.level = 0.95, 
                alternative = "two.sided")

braverock/blotter documentation built on Dec. 16, 2024, 1:02 p.m.