# cscv: Combinatorially Symmetric Cross-validation (CSCV) In nmatare/cscv: Combinatorially Symmetric Cross-validation

## Description

Performs combinatoral symmetric cross-validation given a true matrix 'M' and 'S' number of composite submatrices.

## Usage

 `1` ```cscv(M, S = 16, FUN, digits = 3L, parallel = FALSE, relax = TRUE) ```

## Arguments

 `M` A true matrix where columns are the number of trials (i.e., corresponding model parameters), and rows are the number of observations (e.g., returns). `S` An even number corresponding to the number of 'M' submatrices to be formed for the training(J) and testing(Jbar) sets; default 16. `FUN` A function that evaluates a vector of observations (e.g., FUN = function(x, rf = 0.02 / 252) mean(x - rf) / sd(x - rf)). `digits` The number of reported digits; default 3. `parallel` Whether to compute in parallel; default TRUE. `relax` In the original implementation, Bailey et al (2015) restrict 'S' to evenly divide 'M'. If relax is set to TRUE, one can choose an 'S' that does not evenly divide M while ensuring that the left-over splits are appropiately distributed among the training sets J and testing sets Jbar.

## Details

The 'cscv' performs the CSCV algorithim as detailed by Bailey et al (2015) The Probability of Backtest Overfitting. Given a true matrix 'M', cscv will (1) split 'M' into 'S' number of sub-matrices, (2) form all sub-matrix combinations taken in groups of size S/2, and (3) perform CSCV given an evaluation function, 'FUN'.

## Value

a list of class 'cscv' containing:

cumdistf_Rbar_rank:

The cumulative distribution function over all strategies.

cumdistf_Rbar_all:

The cumulative distribution function over optimized(ranked) strategies.

pairs:

S!/[(S-S/2)!*(S/2)!] number of combinations (rows) including the performance of the chosen IS trial(R) and of the chosen OOS trial (Rbar), with lambda or relative rank transformed as a logit function.

insample_neg:

The proportion of negatively chosen insample.

outsample_neg:

Described as probability loss, or the probability that the model selected as optimal will deliver a loss OOS.

num_submatrices:

The number of formed submatrices.

beta:

The slope of IS to OOS performance degradation.

phi:

The probability of backtest overfit.

## Author(s)

Nathan Matare <email: [email protected]>

## References

Bailey et al (2015) "The Probability of Backtest Overfitting" https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2326253

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18``` ```## Not run: M <- replicate(10, rnorm(1000, 0, 1)) S <- 6L # number of sub matrices trials <- ncol(M) # number of models (parameters) observations <- nrow(M) # number of observations (e.g., returns) evaluation.function <- sharpe.ratio <- function(x, rf = 0.02 / 252) mean(x - rf) / sd(x - rf) result <- cscv( M = M, S = S, FUN = evaluation.function, parallel = FALSE ) result ## End(Not run) ```