cv_kfold_strata: Stratified K-fold cross validation folds generation

View source: R/utils.R

cv_kfold_strataR Documentation

Stratified K-fold cross validation folds generation

Description

Generates folds for the stratified k-fold cross validation where k mutually exclusive folds are generated and the training phase is done using k − 1 folds and the testing with the remaining one, which ensures all individuals are part of the testing once. Given a categorical variable this type of cross validation ensures each fold contains the same proportion of elements of each class, so it is a good option for balanced folds.

Usage

cv_kfold_strata(data, k = 5)

Arguments

data

(vector) The categorical data considered to stratify the folds.

k

(numeric(1)) The number of folds. 5 by default.

Value

A list with k elements where each element is a named list with the elements training wich includes the indices of those records to be part of the training set and testing wich includes the indices of those records to be part of the testing set. Training and testing sets of each fold are exhaustive and mutually exclusive.

Examples

## Not run: 
# Generates 5 folds of 2 elements (10 / 5) in testing set
data <- c(rep("A", 10), rep("B", 20), rep("C", 30))
folds <- cv_kfold_strata(data, 5)
# Indices of training set in fold 1
folds[[1]]$training
# Indices of testing set in fold 1
folds[[1]]$testing
# Verify fold 1 is balanced in training
table(data[folds[[1]]$training])
# Verify fold 1 is balanced in testing
table(data[folds[[1]]$testing])
#' # Verify fold 2 is balanced in training
table(data[folds[[2]]$training])
# Verify fold 2 is balanced in testing
table(data[folds[[2]]$testing])

folds <- cv_kfold_strata(iris$Species, 30)
# List with indices of training and testing of fold 1
folds[[1]]
# List with indices of training and testing of fold 2
folds[[2]]
folds[[3]]
# ...
folds[[30]]

## End(Not run)


brandon-mosqueda/SKM documentation built on Feb. 8, 2025, 5:24 p.m.