dataSplit: A procedure to split whole dataset into multiple folds.
In rModeling: A Framework of Cross-Validation

Description Usage Arguments Value Author(s) References

View source: R/functions.R

the whole dataset is split into multiple folds randomly (batch=NULL) or according to the batch information (batch is specified). The number of folds are defined by nFold in the former case. In the latter case, data belonging to each batch is used as one fold if nBatch=0, otherwise the dataset is split into nBatch folds according to the batch information (i.e., data from the same batch will be used exclusively in one fold).

1
2
3

 dataSplit(ixData, batch = NULL, 
           nBatch = 0, nFold = 10, 
           verbose = TRUE, seed = NULL)

`ixData`	a vector of integers, demonstrating the indices of spectra.
`batch`	a vector of sample identifications (e.g., batch/patient ID), must be the same length as `ixData`. Ideally, this should be the identification of the samples at the highest hierarchy (e.g., the patient ID rather than the spectral ID). If missing, the data is split randomly into `nFold` folds.
`nBatch`	an integer, the number of data folds in case of batch-wise cross-validaiton (if `nBatch=0`, each batch will be used as one fold). Ignored if `batch` is missing.
`nFold`	an integer, the number of data folds in case of normal k-fold cross-validaiton. Ignored if `batch` is given.
`verbose`	a boolean value, if or not to print out the logging info.
`seed`	an integer, if given, will be used as the random seed to split the data in case of k-fold cross-validation. Ignored if `batch` is given.