knitr::opts_chunk$set( collapse = TRUE, comment = "#>", echo = FALSE, cache = TRUE ) require(quantstrat) require(knitr) require(pander) panderOptions("digits", 2) source('tiingoapi.R') # this file contains only one line: # setDefaults(getSymbols.tiingo, api.key='MYAPIKEY') require(doParallel) registerDoParallel() options("getSymbols.warning4.0"=FALSE) options("getSymbols.yahoo.warning"=FALSE)
Brian Peterson:
Proprietary Trading:
Back-testing. I hate it - it's just optimizing over history. You never see a bad back-test. Ever. In any strategy. - Josh Diedesch (2014), CalSTRS
Every trading system is in some form an optimization. - Emilio Tomasini [-@Tomasini2009]
Many system developers consider
to be adequate.
Instead, strive to:
Constraints
Benchmarks
Objectives
knitr::include_graphics("hypothesis_process.png")
To create a testable idea (a hypothesis):
good/complete Hypothesis Statements include:
knitr::include_graphics("toolchain.png")
knitr::include_graphics("building_blocks.png")
Filters
Indicators
Signals
Rules
install.packages('devtools') # if you don't have it installed install.packages('PerformanceAnalytics') install.packages('FinancialInstrument') devtools::install_github('braverock/blotter') devtools::install_github('braverock/quantstrat')
stock.str <- 'EEM' currency('USD') stock(stock.str,currency='USD',multiplier=1) startDate='2003-12-31' initEq=100000 portfolio.st='macd' account.st='macd' initPortf(portfolio.st,symbols=stock.str) initAcct(account.st,portfolios=portfolio.st,initEq = initEq) initOrders(portfolio=portfolio.st) strategy.st<-portfolio.st # define the strategy strategy(strategy.st, store=TRUE)
## get data getSymbols(stock.str, from=startDate, adjust=TRUE, src='tiingo')
Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. - John Tukey [-@Tukey1962] p. 13
No matter how beautiful your theory, no matter how clever you are or what your name is, if it disagrees with experiment, it’s wrong. - Richard P. Feynman [-@Feynman1965]
#MA parameters for MACD fastMA = 12 slowMA = 26 signalMA = 9 maType="EMA" #one indicator add.indicator(strategy.st, name = "MACD", arguments = list(x=quote(Cl(mktdata)), nFast=fastMA, nSlow=slowMA), label='_' )
MACD is a two moving average cross system that seeks to measure:
Classical technical analysis, for example only, not widely deployed in production
What do you think you're measuring? A good indicator measures something in the market:
Make sure the indicator is testable:
facts to support or refute
general tests
specific tests
#two signals add.signal(strategy.st, name="sigThreshold", arguments = list(column="signal._", relationship="gt", threshold=0, cross=TRUE), label="signal.gt.zero" ) add.signal(strategy.st, name="sigThreshold", arguments = list(column="signal._", relationship="lt", threshold=0, cross=TRUE), label="signal.lt.zero" )
Signals are often combined:
This is a composite signal, and serves to reduce the dimensionality of the decision space.
A lower dimensioned space is easier to measure, but is at higher risk of overfitting.
Avoid overfitting while combining signals by making sure that your process has a strong economic or theoretical basis before writing code or running tests
Signals make predictions so all the literature on forecasting is applicable:
add.distribution(strategy.st, paramset.label = 'signal_analysis', component.type = 'indicator', component.label = '_', variable = list(n = fastMA), label = 'nFAST' ) add.distribution(strategy.st, paramset.label = 'signal_analysis', component.type = 'indicator', component.label = '_', variable = list(n = slowMA), label = 'nSLOW' )
sa_buy <- apply.paramset.signal.analysis( strategy.st, paramset.label='signal_analysis', portfolio.st=portfolio.st, sigcol = 'signal.gt.zero', sigval = 1, on=NULL, forward.days=50, cum.sum=TRUE, include.day.of.signal=FALSE, obj.fun=signal.obj.slope, decreasing=TRUE, verbose=TRUE)
sa_sell <- apply.paramset.signal.analysis( strategy.st, paramset.label='signal_analysis', portfolio.st=portfolio.st, sigcol = 'signal.lt.zero', sigval = 1, on=NULL, forward.days=10, cum.sum=TRUE, include.day.of.signal=FALSE, obj.fun=signal.obj.slope, decreasing=TRUE, verbose=TRUE)
signal.plot(sa_buy$sigret.by.asset$EEM)
beanplot.signals(sa_buy$sigret.by.paramset$paramset.12.26)
distributional.boxplot(sa_buy$sigret.by.paramset$paramset.12.26$EEM)
signal.plot(sa_sell$sigret.by.asset$EEM)
# entry add.rule(strategy.st, name='ruleSignal', arguments = list(sigcol="signal.gt.zero", sigval=TRUE, orderqty=100, ordertype='market', orderside='long', threshold=NULL), type='enter', label='enter', storefun=FALSE )
# exit add.rule(strategy.st,name='ruleSignal', arguments = list(sigcol="signal.lt.zero", sigval=TRUE, orderqty='all', ordertype='market', orderside='long', threshold=NULL, orderset='exit2'), type='exit', label='exit' )
Beware of Rule Burden:
start_t<-Sys.time() out<-applyStrategy(strategy.st , portfolios=portfolio.st, parameters=list(nFast=fastMA, nSlow=slowMA, nSig=signalMA, maType=maType), verbose=TRUE) end_t<-Sys.time() start_pt<-Sys.time() updatePortf(Portfolio=portfolio.st) end_pt<-Sys.time()
print("Running the backtest (applyStrategy):") print(end_t-start_t) print("trade blotter portfolio update (updatePortf):") print(end_pt-start_pt)
chart.Posn(Portfolio=portfolio.st,Symbol=stock.str) plot(add_MACD(fast=fastMA, slow=slowMA, signal=signalMA,maType="EMA"))
good parameters are parsimonious
production strategies have additional parameters that are specific to the production environment
limiting free parameters
more parameters lowers your degrees of freedom, and
goal should be to eliminate free parameters before running parameter optimization
does this still count as a free parameter?
quantstrat parameters are added via add.distribution
relationships (constraints) between parameters are set via add.distribution.constraint
.FastMA = (3:15) .SlowMA = (20:60) # .nsamples = 200 #for random parameter sampling, # less important if you're using doParallel or doMC ### MA paramset add.distribution(strategy.st, paramset.label = 'MA', component.type = 'indicator', component.label = '_', #this is the label given to the indicator in the strat variable = list(n = .FastMA), label = 'nFAST' ) add.distribution(strategy.st, paramset.label = 'MA', component.type = 'indicator', component.label = '_', #this is the label given to the indicator in the strat variable = list(n = .SlowMA), label = 'nSLOW' ) add.distribution.constraint(strategy.st, paramset.label = 'MA', distribution.label.1 = 'nFAST', distribution.label.2 = 'nSLOW', operator = '<', label = 'MA' )
.paramaudit <- new.env() ps_start <- Sys.time() paramset.results <- apply.paramset(strategy.st, paramset.label='MA', portfolio.st=portfolio.st, account.st=account.st, # nsamples=.nsamples, audit=.paramaudit, store=TRUE, verbose=FALSE) ps_end <- Sys.time()
cat("Running the parameter search (apply.paramset): \n ") print(ps_end-ps_start) cat("Total trials:",.strategy$macd$trials,"\n")
plot(paramset.results$cumPL[-1,], major.ticks = 'years', grid.ticks.on = 'years')
z <- tapply(X=paramset.results$tradeStats$Profit.To.Max.Draw, INDEX=list(paramset.results$tradeStats$nFAST, paramset.results$tradeStats$nSLOW), FUN=median) x <- as.numeric(rownames(z)) y <- as.numeric(colnames(z)) filled.contour(x=x,y=y,z=z,color = heat.colors, xlab="Fast MA",ylab="Slow MA") title("Return to MaxDrawdown")
Look Ahead Bias
Data Mining Bias
Data Snooping
Pardo [-@Pardo2008, p. 130-131] describes the degrees of freedom of a strategy as:
In parameter optimization, we should consider the sum of observations used by all different parameter combinations.
Goal should be to have 95% or more free parameters or 'free observations' even after parameter search.
degrees.of.freedom(strategy = 'macd', portfolios = 'macd', paramset.method='trial') degrees.of.freedom(strategy = 'macd', portfolios = 'macd', paramset.method='sum')
to increase degrees of freedom, you may:
torture and training data sets should be large enough to still have reasonable statistical confidence when you move to walk forward
@Bailey2014deSharpe describes a way of adjusting the observed Sharpe Ratio of a candidate strategy by taking the variance of the trials and the skewness and kurtosis into account.
establishes theoretical maximum Sharpe for a series of related trials
we have implemented a version with @Kipnis2017 as SharpeRatio.deflated
dsr <- SharpeRatio.deflated(portfolios='macd',strategy='macd', audit=.paramaudit)
kable(dsr)
hsr <- SharpeRatio.haircut(portfolios='macd',strategy='macd',audit=.paramaudit)
print(hsr)
Sampling Without replacement:
Sampling With Replacement:
Disadvantages of Sampling from portfolio P&L:
rsim <- mcsim( Portfolio = "macd" , Account = "macd" , n=1000 , replacement=TRUE , l=1, gap=10) rblocksim <- mcsim( Portfolio = "macd" , Account = "macd" , n=1000 , replacement=TRUE , l=10, gap=10)
P&L Quantiles:
pander(quantile(rsim)) pander(quantile(rblocksim))
plot(rsim) lines (cumsum(dailyEqPL('macd.241',Symbols = stock.str, envir=.paramaudit)),on = 0,col = 'blue')
hist(rsim, cex=.5, methods='maxDD')
plot(rblocksim)
hist(rblocksim, methods='maxDD')
rm(.paramaudit, paramset.results, ps_start, ps_end)
White’s Data Mining Reality Check from @White2000 http://www.cristiandima.com/white-s-reality-check-for-data-snooping-in-r/
bootstrap optimization as an option in LSTM [@Vince2009]
discussion in Aronson [-@Aronson2006, p. 230-240]
Disadvantages:
Advantages:
effectively creates simulated traders with the same style as strategy but no skill
best for modeling "skill vs. luck"
Extract Stylized Facts from the observed series:
For each replicate:
For the collections of start/qty/duration:
# nrtxsim <- txnsim( Portfolio = "macd" # , n=250 # , replacement=FALSE # , tradeDef = 'increased.to.reduced') wrtxsim <- txnsim( Portfolio = "macd" , n=250 , replacement=TRUE , tradeDef = 'increased.to.reduced')
Comments:
wrtxsim.pl <- plot(wrtxsim)
P&L Quantiles:
pander(quantile(wrtxsim))
wrplvector<-as.numeric(last(wrtxsim.pl)) btpl<-wrplvector[1] hist(wrplvector, main='Histogram of Cumulative P&L', breaks=25) abline(v = btpl, col = "red", lty = 2) b.label = ("Backtest P&L") h = rep(0.2 * par("usr")[3] + 1 * par("usr")[4], length(wrplvector)) text(btpl, h, b.label, offset = 0.2, pos = 2, cex = 0.8, srt = 90) abline(v=median(wrplvector), col = "darkgray", lty = 2) c.label = ("Sample Median") text(median(wrplvector), h, c.label, offset = 0.2, pos = 2, cex = 0.8, srt = 90)
{width=40%}
consider choice of objective
be careful about performing walk forward analysis then making changes
more trials increases bias
wfportfolio <- "wf.macd" initPortf(wfportfolio,symbols=stock.str) initOrders(portfolio=wfportfolio) wf_start <- Sys.time() wfaresults <- walk.forward(strategy.st, paramset.label='MA', portfolio.st=wfportfolio, account.st=account.st, # nsamples=100, period='months', k.training = 48, k.testing = 12, verbose = FALSE, anchored = FALSE, audit.prefix = NULL, savewf = FALSE, include.insamples = TRUE, psgc=TRUE ) wf_end <-Sys.time()
cat("\n Running the walk forward search: \n ") print(wf_end-wf_start) cat(" Total trials:",.strategy$macd$trials,"\n")
kable(wfaresults$tradeStats)
chart.forward(wfaresults)
kable(wfaresults$testing.parameters[c(3,1,2)])
Strong hypotheses guard against risk of ruin.
I hypothesize that this strategy idea will make money.
Specifying hypotheses at the beginning reduces the urge to modify them later and: - adjust expectations while testing - revise the objectives - construct ad hoc hypotheses
seek to answer what and why before going too far
multiple adjustments exist, examine
stay skeptical of your results
Thanks to all the contributors to quantstrat and blotter, especially Ross Bennett, Peter Carl, Jasen Mackie, Joshua Ulrich, my team, and my family, who make it possible.
©2018 Brian G. Peterson brian\@braverock.com
This work is licensed under a Creative Commons Attribution 4.0 International License
The rmarkdown [@Rmarkdown] source code for this document may be found on github
si<-sessionInfo() cat( 'prepared using blotter:',si$otherPkgs$blotter$Version ,' and quantstrat:', si$otherPkgs$quantstrat$Version,'\n')
All views expressed in this presentation are those of Brian Peterson, and do not necessarily reflect the opinions, policies, or practices of Brian's employers.
All remaining errors or omissions should be attributed to the author.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.