cscv: Combinatorially Symmetric Cross-validation (CSCV)

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/cscv.R

Description

Performs combinatorial symmetric cross-validation given a true matrix 'M' and 'S' number of composite sub-matrices.

Usage

1
cscv(M, S = 16, FUN, digits = 3L, parallel = FALSE, relax = TRUE)

Arguments

M

A true matrix where columns are the number of trials (i.e., corresponding model parameters), and rows are the number of observations (e.g., returns).

S

An even number corresponding to the number of 'M' sub-matrices to be formed for the training(J) and testing(Jbar) sets; default 16.

FUN

A function that evaluates a vector of observations (e.g., FUN = function(x, rf = 0.02 / 252) mean(x - rf) / sd(x - rf)).

digits

The number of reported digits; default 3.

parallel

Whether to compute in parallel; default TRUE.

relax

In the original implementation, Bailey et al (2015) restrict 'S' to evenly divide 'M'. If relax is set to TRUE, one can choose an 'S' that does not evenly divide M while ensuring that the left-over splits are appropriately distributed among the training sets J and testing sets Jbar.

Details

'cscv' performs the CSCV algorithim as detailed by Bailey et al (2015) The Probability of Backtest Over-fitting. Given a true matrix 'M', cscv will (1) split 'M' into 'S' number of sub-matrices, (2) form all sub-matrix combinations taken in groups of size S/2, and (3) perform CSCV given an evaluation function, 'FUN'.

Value

a list of class 'cscv' containing:

cumdistf_Rbar_rank:

The cumulative distribution function over all strategies.

cumdistf_Rbar_all:

The cumulative distribution function over optimized(ranked) strategies.

pairs:

S!/[(S-S/2)!*(S/2)!] number of combinations (rows) including the performance of the chosen IS trial(R) and of the chosen OOS trial (Rbar), with lambda transformed via the logit function.

insample_neg:

The proportion of negatively chosen IS models.

outsample_neg:

Described as probability loss, or the probability that the model selected as optimal IS will deliver a loss OOS.

num_submatrices:

The number of formed sub-matrices.

beta:

The slope of IS to OOS performance degradation.

phi:

The probability of backtest overfit.

Author(s)

Nathan Matare <email: nmatare@chicagobooth.com>

References

Bailey et al (2015) "The Probability of Backtest Overfitting" https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2326253

See Also

plot.cscv(), summary.cscv()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Not run: 
M               <- replicate(10, rnorm(1000, 0, 1))
S               <- 6L # number of sub matrices
trials          <- ncol(M) # number of models (parameters)
observations    <- nrow(M) # number of observations (e.g., returns)
evaluation.function <- sharpe.ratio <-
   function(x, rf = 0.02 / 252) mean(x - rf) /  sd(x - rf)

result          <- cscv(
                     M           = M, 
                     S           = S, 
                     FUN         = evaluation.function, 
                     parallel    = FALSE
)
result


## End(Not run)

nmatare/quanttools documentation built on May 23, 2019, 9:32 a.m.