Randomly partitions data for cross-validation.

Description

For a vector of training labels, we return a list of cross-validation folds, where each fold has the indices of the observations to leave out in the fold. In terms of classification error rate estimation, one can think of a fold as a the observations to hold out as a test sample set. Either the hold_out size or the number of folds, num_folds, can be specified. The number of folds defaults to 10, but if the hold_out size is specified, then num_folds is ignored.

Usage

1
cv_partition(y, num_folds = 10, hold_out = NULL, seed = NULL)

Arguments

y

a vector of class labels

num_folds

the number of cross-validation folds. Ignored if hold_out is not NULL. See Details.

hold_out

the hold-out size for cross-validation. See Details.

seed

optional random number seed for splitting the data for cross-validation

Details

We partition the vector y based on its length, which we treat as the sample size, 'n'. If an object other than a vector is used in y, its length can yield unexpected results. For example, the output of length(diag(3)) is 9.

Value

list the indices of the training and test observations for each fold.

Examples

1
2
3
4
5
# The following three calls to \code{cv_partition} yield the same partitions.
set.seed(42)
cv_partition(iris$Species)
cv_partition(iris$Species, num_folds = 10, seed = 42)
cv_partition(iris$Species, hold_out = 15, seed = 42)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.