cv_partition: Partitions data for cross-validation.

Description Usage Arguments Details Value Examples

View source: R/helper-partition-data.r

Description

For a vector of training labels, we return a list of cross-validation folds, where each fold has the indices of the observations to leave out in the fold. In terms of classification error rate estimation, one can think of a fold as a the observations to hold out as a test sample set.

Usage

1
2
  cv_partition(y, num_folds = 10, hold_out = NULL,
    seed = NULL)

Arguments

y

a vector of class labels to partition

num_folds

the number of cross-validation folds. Ignored if hold_out is not NULL. See Details.

hold_out

the hold-out size for cross-validation. See Details.

seed

optional random number seed for splitting the data for cross-validation

Details

Either the hold_out size or num_folds can be specified. The number of folds defaults to 10, but if the hold_out size is specified, then num_folds is ignored.

We partition the vector y based on its length, which we treat as the sample size, n. If an object other than a vector is used in y, its length can yield unexpected results. For example, the output of length(diag(3)) is 9.

Value

list the indices of the training and test observations for each fold.

Examples

1
2
3
4
5
6
library(MASS)
# The following three calls to \code{cv_partition} yield the same partitions.
set.seed(42)
cv_partition(iris$Species)
cv_partition(iris$Species, num_folds = 10, seed = 42)
cv_partition(iris$Species, hold_out = 15, seed = 42)

sortinghat documentation built on May 30, 2017, 4:52 a.m.