TrainValSplit: Form the train/validation split by generating all...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/PBOFun.R

Description

This function splits the data into a train set and a validation set as many times as the number of combinations of drawing n/2 from n partitions. The number of this combinations is given by Bin(n, n/2), or binomial coefficient of n items taking n/2 at a time. Each combination provides the indices to the list Ms, and vertical stacking of these individual submatrices forms a training matrix. What is not selected in a combination index set are used as the validation set.

Usage

1

Arguments

Ms

list of equal-size matrices

Details

If the order of the rows in the original matrix is chronological, then each of training and validation matrix respects this chronological order.

WARNING : The resulting lists grow exponentially with the number of partitions. For example, when N=10, there are 252 splits, so the length of the returning list is 504. Doubling N, the list would have 369,512 matrices.

Value

list of two lists : Train, Val, where each is a list of length n/2 of matrices from the given Ms. Train <==> J, Val <==> J_bar in Bailey et al.

Author(s)

Horace W. Tso horacetso@gmail.com

References

Bailey, D. H., Borwein, J., Lopez de Prado, M., & Zhu, Q. J. (2016). The probability of backtest overfitting. https://www.carma.newcastle.edu.au/jon/backtest2.pdf

Lopez de Prado (2018), Advances in Financial Machine Learning, John Wiley & Sons.

See Also

PBO::CalcLambda()

Examples

1
2
3
4
5
6
7
8
9
   N = 20 # no of strategies
   TT = 1000 # no of observations
   S = 20 # no of partitions
   M = matrix(rnorm(N*TT, mean=0.1, sd=1), ncol=N, nrow=TT)
   Ms = DivideMat(M, S)
   res <- TrainValSplit(Ms)
   length(res$Train)
   length(res$Val)
   head(res$Train[[1]])

htso/PBO documentation built on Jan. 31, 2020, 4:20 p.m.