vfold_cv: V-Fold Cross-Validation

Description Usage Arguments Details Value Examples

Description

V-fold cross-validation randomly splits the data into V groups of roughly equal size (called "folds"). A resample of the analysis data consisted of V-1 of the folds while the assessment set contains the final fold. In basic V-fold cross-validation (i.e. no repeats), the number of resamples is equal to V.

Usage

1
vfold_cv(data, v = 10, repeats = 1, strata = NULL, ...)

Arguments

data

A data frame.

v

The number of partitions of the data set.

repeats

The number of times to repeat the V-fold partitioning.

strata

A variable that is used to conduct stratified sampling to create the folds. This should be a single character value.

...

Not currently used.

Details

The 'strata' argument causes the random sampling to be conducted *within the stratification variable*. The can help ensure that the number of data points in the analysis data is equivalent to the proportions in the original data set.

When more than one repeat is requested, the basic V-fold cross-validation is conducted each time. For example, if three repeats are used with 'v = 10', there are a total of 30 splits which as three groups of 10 that are generated separately.

Value

An tibble with classes 'vfold_cv', 'rset', 'tbl_df', 'tbl', and 'data.frame'. The results include a column for the data split objects and one or more identification variables. For a single repeats, there will be one column called 'id' that has a character string with the fold identifier. For repeats, 'id' is the repeat number and an additional column called 'id2' that contains the fold information (within repeat).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
vfold_cv(mtcars, v = 10)
vfold_cv(mtcars, v = 10, repeats = 2)

library(purrr)
iris2 <- iris[1:130, ]

set.seed(13)
folds1 <- vfold_cv(iris2, v = 5)
map_dbl(folds1$splits,
        function(x) {
          dat <- as.data.frame(x)$Species
          mean(dat == "virginica")
        })

set.seed(13)
folds2 <- vfold_cv(iris2, strata = "Species", v = 5)
map_dbl(folds2$splits,
        function(x) {
          dat <- as.data.frame(x)$Species
          mean(dat == "virginica")
        })

topepo/rsample documentation built on May 4, 2019, 4:25 p.m.