cscv: Combinatorially Symmetric Cross-validation (CSCV)

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/cscv.R

Description

Performs combinatoral symmetric cross-validation given a true matrix 'M' and 'S' number of composite submatrices.

Usage

1
cscv(M, S = 16, FUN, digits = 3L, parallel = FALSE, relax = TRUE)

Arguments

M

A true matrix where columns are the number of trials (i.e., corresponding model parameters), and rows are the number of observations (e.g., returns).

S

An even number corresponding to the number of 'M' submatrices to be formed for the training(J) and testing(Jbar) sets; default 16.

FUN

A function that evaluates a vector of observations (e.g., FUN = function(x, rf = 0.02 / 252) mean(x - rf) / sd(x - rf)).

digits

The number of reported digits; default 3.

parallel

Whether to compute in parallel; default TRUE.

relax

In the original implementation, Bailey et al (2015) restrict 'S' to evenly divide 'M'. If relax is set to TRUE, one can choose an 'S' that does not evenly divide M while ensuring that the left-over splits are appropiately distributed among the training sets J and testing sets Jbar.

Details

The 'cscv' performs the CSCV algorithim as detailed by Bailey et al (2015) The Probability of Backtest Overfitting. Given a true matrix 'M', cscv will (1) split 'M' into 'S' number of sub-matrices, (2) form all sub-matrix combinations taken in groups of size S/2, and (3) perform CSCV given an evaluation function, 'FUN'.

Value

a list of class 'cscv' containing:

cumdistf_Rbar_rank:

The cumulative distribution function over all strategies.

cumdistf_Rbar_all:

The cumulative distribution function over optimized(ranked) strategies.

pairs:

S!/[(S-S/2)!*(S/2)!] number of combinations (rows) including the performance of the chosen IS trial(R) and of the chosen OOS trial (Rbar), with lambda or relative rank transformed as a logit function.

insample_neg:

The proportion of negatively chosen insample.

outsample_neg:

Described as probability loss, or the probability that the model selected as optimal will deliver a loss OOS.

num_submatrices:

The number of formed submatrices.

beta:

The slope of IS to OOS performance degradation.

phi:

The probability of backtest overfit.

Author(s)

Nathan Matare <email: [email protected]>

References

Bailey et al (2015) "The Probability of Backtest Overfitting" https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2326253

See Also

plot.cscv(), summary.cscv()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Not run: 
M               <- replicate(10, rnorm(1000, 0, 1))
S               <- 6L # number of sub matrices
trials          <- ncol(M) # number of models (parameters)
observations    <- nrow(M) # number of observations (e.g., returns)
evaluation.function <- sharpe.ratio <-
   function(x, rf = 0.02 / 252) mean(x - rf) /  sd(x - rf)

result          <- cscv(
                     M           = M, 
                     S           = S, 
                     FUN         = evaluation.function, 
                     parallel    = FALSE
                  )
result


## End(Not run)

nmatare/cscv documentation built on Dec. 17, 2017, 12:06 p.m.