kfold: (Un)Stratified k-fold for any type of label

Description Usage Arguments Value Examples

Description

This function allows to create (un)stratified folds from a label vector.

Usage

1
kfold(y, k = 5, stratified = TRUE, seed = 0, named = TRUE)

Arguments

y

Type: The label vector.

k

Type: integer. The amount of folds to create. Causes issues if length(y) < k (e.g more folds than samples). Defaults to 5.

stratified

Type: boolean. Whether the folds should be stratified (keep the same label proportions) or not. Defaults to TRUE.

seed

Type: integer. The seed for the random number generator. Defaults to 0.

named

Type: boolean. Whether the folds should be named. Defaults to TRUE.

Value

A list of vectors for each fold, where an integer represents the row number.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Reproducible Stratified folds
data <- 1:5000
folds1 <- kfold(y = data, k = 5, stratified = TRUE, seed = 111)
folds2 <- kfold(y = data, k = 5, stratified = TRUE, seed = 111)
identical(folds1, folds2)

# Stratified Regression
data <- 1:5000
folds <- kfold(y = data, k = 5, stratified = TRUE)
for (i in 1:length(folds)) {
  print(mean(data[folds[[i]]]))
}

# Stratified Multi-class Classification
data <- c(rep(0, 250), rep(1, 250), rep(2, 250))
folds <- kfold(y = data, k = 5, stratified = TRUE)
for (i in 1:length(folds)) {
  print(mean(data[folds[[i]]]))
}

# Unstratified Regression
data <- 1:5000
folds <- kfold(y = data, k = 5, stratified = FALSE)
for (i in 1:length(folds)) {
  print(mean(data[folds[[i]]]))
}

# Unstratified Multi-class Classification
data <- c(rep(0, 250), rep(1, 250), rep(2, 250))
folds <- kfold(y = data, k = 5, stratified = FALSE)
for (i in 1:length(folds)) {
  print(mean(data[folds[[i]]]))
}

ablanda/Esame documentation built on May 30, 2019, 6:11 p.m.