View source: R/benchTradeStats.R
benchTradeStats | R Documentation |
The main scope of the function is to gather statistical tests that are carried
on benchmarked trades performance, as obtained via benchTradePerf()
.
When testing trading strategies on Symbols, assessing whether there is a statistical
significance difference in their performance is of interest. In other words,
the goal is determining which given strategy outperformed the other or if they
statistically bear the same results in terms of performance. All the statistical
test included are non-parametric tests, that is distribution-free tests.
These tests allow great flexibility, but in turn require that data verifies
some assumptions in order for their results to be meaningful.
benchTradeStats(Portfolio, benchmark, side, type, metric, POV, OrdersMktData, approach = c("paired", "independent"), test = c("Sign", "Wilcoxon", "Median", "WMW"), dgptest = c("ChiSq", "KS"), conf.level, alternative)
Portfolio |
A vector of character strings idenfifying initilized Portfolio objects with the Symbol(s) to test. See 'Details' |
benchmark |
A character string indentifying one of the benchmarks in
|
side |
A numeric value which indicates the side of the trade
Either 1 or -1, |
type |
A list with named element |
metric |
A numeric value, either 1 or -1 meaning "performance metric" or "cost metric", respectively. See 'Notes' |
POV |
A numeric value between 0 and 1, specifying the POV rate for the 'PWP' benchmark |
OrdersMktData |
A list or nested list of |
approach |
A character string indentifying the statistical testing approach. Either 'paired' or 'independent' |
test |
A character string indentifying the statistical test to run.
If |
dgptest |
A string identifying the distribution analysis test to run. Either 'ChiSq' or 'KS' |
conf.level |
A numeric value, the confidence level of the interval (see |
alternative |
A string identifying the statistical test tail (see |
There exists a wide range of algorithmic trading strategies and even whithin the same category of strategies many sligthly different versions may exist (they often differ by brokerage firm). In a post-trade analysis framework, starting from trading strategies transactions prices we compare the overall performance of multiple orders, each executed under two different categories, to ultimately test whether data supports a difference in their median. In other words, the basic statistical hypothesis test setting is to test the null hypothesis of a same median against the alternative hypothesis of a different median.
Two statistical testing approach are contemplated, the suitability of the approach ultimately relies on analysts' specific research questions. In turn, each of them critically depends on how transactional data was obtained in the first place and on its distributional properties. First, the paired samples approach, where: trades are two equal-length child orders belonging to the same parent order on a Symbol and each occurred in the same exact timeframe. Following Kissell, the preferred comparison metric against which trades have to be benchmarked is the VWAP benchmark. In this context, tests included are the Wicolxon Signed Rank test and the Sign test. Second, the independent samples approach, where trades can be on different Symbols, that may have occured over different periods and possibly with different frequency; here Kissell suggests the Arrival Cost as the preferred trades benchmark metric. In this context, tests included are the Median test and the Wicolxon-Mann-Withney test.
In addition to the statistical tests above, one may be interested in studying the distribution of the overall performance/cost across the orders in order to assess whether they come from the same distribution or not. Such analysis is said a distribution analysis, which for our purposes reduces to a data generating process (DGP) comparison. Statistical tests implemented in this framework are the Chi-Square goodness-of-fit test and the Kolmogorov-Smirnoff goodness-of-fit test.
In the paired samples approach we seek to test different trading strategies
(or brokers) on the same symbol, whereas in the indepented samples approach this
may or may not be the case. By and large, because of the way a portfolio object
is created and updated with transactions in blotter
, a single Symbol
is always kept unique to prevent data corruption. Therefore, even when testing
strategies on the same symbol, one must nonetheless make the distinction in the
first place, by initializing the Symbol with fictitious different names.
Also, note that the market data needed in the number of possible scenarios one
may be interested in analyzing is binded to the statistical testing approach.
Because of its assumptions, in the paired approach each couple of fictitious
symbols share the same market data. Hence, OrdersMktData
represents a
list with length equal to the number of orders.
This is not necessarily true in the independent approach, where there could be
cross-sectional analyses purposes over different assets that may have been traded
over different periods. In all these cases, the OrdersMktData
needed would
be much richer in variety as the reference MktData
can be completely
different in kind and in periods. Other conditions met, one must simply build
the input OrdersMktData
with copies of the target MktData
as needed.
A list
whose elements depend on specified parameters.
Bench.Data
: A list whose elements are data.frame
s of benchTradePerf
outputs, one for each Portfolio-Symbol combination, all under the specified parameters
Bench.Test.Data
: A data.frame
with statistical testing input data
*.Test.Output
: A "htest"
output object of the selected statistical test
, except for the 'Median' test
*.Report
: A string with a comment, only for 'Median' test
*.DGP.Report
: A string with a comment, only for 'ChiSq' dgptest
In the independet testing approach, cost metrics are suggested in spite of
performance metrics (used in benchTradePerf
).
The only difference among these kinds of metrics in their sign and thus in values
interpretation: positive values of a cost metric entail underperformance of the
execution with respect to the benchmark, vice versa negative values indicate
overperformance.
All the benchmarks can be expressed in cost or perfomance terms and this is what
the parameter metric
allows to do.
More often than not the Arrival Cost is suggested as the preferred metric for
benchmarking purposes of transactions under testing. It is, however, nothing
else than the Trading PnL performance metric expressed as a cost metric.
Hence, using benchmark="TradeBench"
and metric = -1
means selecting
the arrival cost benchmark. This will happen by default if these parameter are
not provided.
For both tests categories, test
and dgptest
, the same conf.level
and alternative
are almost always used, if relevant for the use case.
Please also note that, as it should be clear from reports, tests 'Median' and
'ChiSq' only allow for a two-sided alternative at the moment, regardless of
the alternative
input used for the other test.
In the specific case test='Median'
and dgptest='ChiSq'
, the function
will perform a two-sided test in both cases, regardless of alternative
.
Vito Lestingi
The Science of Algorithmic Trading and Portfolio Management (Kissell, 2013), ISBN 978-0-12-401689-7. Statistical Methods to Compare Algorithmic Performance (Kissell, 2007), The Journal of Trading.
benchTradePerf
,
binom.test
,
wilcox.test
,
ks.test
library(blotter) set.seed(333) .blotter <- new.env() data(ABC) ABC.day <- ABC[which(as.Date(index(ABC)) == "2019-02-01"), ] colnames(ABC.day) <- c('MktPrice', 'MktQty') # silly MktData subsettings to get data OrdersMktData <- list(OrdersMktData1 = ABC.day[1:1000], OrdersMktData2 = ABC.day[(nrow(ABC.day) - 2999):(nrow(ABC.day) - 1999)], OrdersMktData3 = ABC.day[(nrow(ABC.day) - 999):nrow(ABC.day)]) inds1 <- sample(1:500, 50) inds2 <- sample(501:1000, 50) txns.1 <- OrdersMktData$OrdersMktData1[sort(inds1)]; colnames(txns.1) <- c('TxnPrice', 'TxnQty') txns.2 <- OrdersMktData$OrdersMktData1[sort(inds2)]; colnames(txns.2) <- c('TxnPrice', 'TxnQty') txns.3 <- OrdersMktData$OrdersMktData2[sort(inds1)]; colnames(txns.3) <- c('TxnPrice', 'TxnQty') txns.4 <- OrdersMktData$OrdersMktData2[sort(inds2)]; colnames(txns.4) <- c('TxnPrice', 'TxnQty') txns.5 <- OrdersMktData$OrdersMktData3[sort(inds1)]; colnames(txns.5) <- c('TxnPrice', 'TxnQty') txns.6 <- OrdersMktData$OrdersMktData3[sort(inds2)]; colnames(txns.6) <- c('TxnPrice', 'TxnQty') # Build 'orders' as portfolios ordNames <- c('order.1', 'order.2', 'order.3') symNames <- c('ABC.A', 'ABC.B') currency('USD') stock(symNames[1], currency = 'USD') stock(symNames[2], currency = 'USD') initPortf(ordNames[1], symbols = symNames) # Order 1 addTxns('order.1', symNames[1], TxnData = txns.1) addTxns('order.1', symNames[2], TxnData = txns.2) initPortf(ordNames[2], symbols = symNames) # Order 2 addTxns('order.2', symNames[1], TxnData = txns.3) addTxns('order.2', symNames[2], TxnData = txns.4) initPortf(ordNames[3], symbols = symNames) # Order 3 addTxns(ordNames[3], symNames[1], TxnData = txns.5) addTxns(ordNames[3], symNames[2], TxnData = txns.6) ## Paired observations approach tests # Sign test, VWAP full and VWAP interval benchTradeStats(Portfolio = ordNames, benchmark = "VWAP", side = 1, type = list(vwap = 'full'), metric = 1, OrdersMktData = OrdersMktData, approach = 'paired', test = 'Sign', conf.level = 0.95, alternative = "two.sided") benchTradeStats(Portfolio = ordNames, benchmark = "VWAP", side = 1, type = list(vwap = 'interval'), OrdersMktData = OrdersMktData, approach = 'paired', test = 'Sign', conf.level = 0.95, alternative = "two.sided") # Wilcoxon test, VWAP full and VWAP interval benchTradeStats(Portfolio = ordNames, benchmark = "VWAP", side = 1, type = list(vwap = 'full'), metric = 1, OrdersMktData = OrdersMktData, approach = 'paired', test = 'Wilcoxon', conf.level = 0.95, alternative = "two.sided") benchTradeStats(Portfolio = ordNames, benchmark = "VWAP", side = 1, type = list(vwap = 'interval'), OrdersMktData = OrdersMktData, approach = 'paired', test = 'Wilcoxon', conf.level = 0.95, alternative = "two.sided") # Sign test, ChiSq test on VWAP interval benchTradeStats(Portfolio = ordNames, benchmark = "VWAP", side = 1, type = list(vwap = 'interval'), metric = 1, OrdersMktData = OrdersMktData, approach = 'paired', test = 'Sign', dgptest = 'ChiSq', conf.level = 0.95, alternative = "two.sided") # Sign test and KS test on VWAP interval benchTradeStats(Portfolio = ordNames, benchmark = "VWAP", side = 1, type = list(vwap = 'interval'), metric = 1, OrdersMktData = OrdersMktData, approach = 'paired', test = 'Sign', dgptest = 'KS', conf.level = 0.95, alternative = "two.sided") ## Independent observations approach tests # silly multiplications to make them differ OrdersMktDataIndp <- list(list(OrdersMktData$OrdersMktData1, OrdersMktData$OrdersMktData2), list(OrdersMktData$OrdersMktData1 * 2, OrdersMktData$OrdersMktData2 * 3), list(OrdersMktData$OrdersMktData1 * 4, OrdersMktData$OrdersMktData2 * 5)) # Median test, TradeBench benchTradeStats(Portfolio = ordNames, benchmark = "TradeBench", side = 1, metric = -1, OrdersMktData = OrdersMktDataIndp, approach = 'independent', test = 'Median', conf.level = 0.95, alternative = "two.sided") # Wilcoxon-Mann-Whitney test, TradeBench benchTradeStats(Portfolio = ordNames, benchmark = "TradeBench", side = 1, metric = -1, OrdersMktData = OrdersMktDataIndp, approach = 'independent', test = 'WMW', conf.level = 0.95, alternative = "two.sided") # Median test, ChiSq test on TradeBench (two reports produced) benchTradeStats(Portfolio = ordNames, benchmark = "TradeBench", side = 1, metric = -1, OrdersMktData = OrdersMktDataIndp, approach = 'independent', test = 'Median', dgptest = 'ChiSq', conf.level = 0.95, alternative = "two.sided") # WMW test and KS test on TradeBench benchTradeStats(Portfolio = ordNames, benchmark = "TradeBench", side = 1, metric = -1, OrdersMktData = OrdersMktDataIndp, approach = 'independent', test = 'WMW', dgptest = 'KS', conf.level = 0.95, alternative = "two.sided")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.