nkfold: (Un)Stratified Repeated k-fold for any type of label

Description Usage Arguments Value Examples

View source: R/nkfold.R

Description

This function allows to create (un)stratified repeated folds from a label vector.

Usage

1
2
nkfold(y, n = 2, k = 5, type = "random", seed = 0, named = TRUE,
  weight = FALSE)

Arguments

y

Type: The label vector.

n

Type: integer. The amount of repeated fold computations to perform. Defaults to 2.

k

Type: integer or vector of integers. The amount of folds to create. Causes issues if length(y) < k (e.g more folds than samples). If a vector of integers is supplied, then for each k-fold in the repeat N, k[N] is selected as the number of folds. Defaults to 5.

type

Type: character or vector of characters. Whether the folds should be stratified (keep the same label proportions for classification), treatment (make each fold exclusive according to the label vector which becomes a vector), pseudo (pseudo-random, attempts to minimize the variance between folds for regression), or random (for fully random folds). If a vector of characters is supplied, then for each k-fold in the repeat N, k[N] is selected type of generating folds. Defaults to random.

seed

Type: integer or vector of integers. The seed for the random number generator. If a vector of integer is provided, its length should be at least longer than n. Otherwise (if an integer is supplied), it starts each fold with the provided seed, and adds 1 to the seed for every repeat. Defaults to 0.

named

Type: boolean. Whether the folds should be named. Defaults to TRUE.

weight

Type: boolean. Whether to return the weights of each fold so their sum is equal to 1. Defaults to TRUE.

Value

A list of vectors for each fold, where an integer represents the row number, or a list of list containing Folds and Weights if weight = TRUE.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Reproducible Stratified Repeated folds
data <- 1:5000
folds1 <- nkfold(y = data, n = 2, k = 5, type = "pseudo", seed = 111)
folds2 <- nkfold(y = data, n = 2, k = 5, type = "pseudo", seed = c(111, 112))
identical(folds1, folds2)

# Repeated Treatments
data <- c(rep(1:50, rep(50, 50)))
str(nkfold(y = data, n = 2, k = 5, type = "treatment"))

# Stratified Repeated Classification
data <- c(rep(0, 250), rep(1, 250))
folds <- nkfold(y = data, n = 2, k = 5, type = "stratified")
for (i in 1:length(folds)) {
  print(mean(data[folds[[i]]]))
}

# Stratified Repeated Regression
data <- 1:5000
folds <- nkfold(y = data, n = 2, k = 5, type = "pseudo")
for (i in 1:length(folds)) {
  print(mean(data[folds[[i]]]))
}

# Stratified Repeated Multi-class Classification
data <- c(rep(0, 250), rep(1, 250), rep(2, 250))
folds <- nkfold(y = data, n = 2, k = 5, type = "stratified")
for (i in 1:length(folds)) {
  print(mean(data[folds[[i]]]))
}

# Unstratified Repeated Regression
data <- 1:5000
folds <- nkfold(y = data, n = 2, k = 5, type = "random")
for (i in 1:length(folds)) {
  print(mean(data[folds[[i]]]))
}

# Unstratified Repeated Multi-class Classification
data <- c(rep(0, 250), rep(1, 250), rep(2, 250))
folds <- nkfold(y = data, n = 2, k = 5, type = "random")
for (i in 1:length(folds)) {
  print(mean(data[folds[[i]]]))
}

# Stratified Repeated 3-5-10 fold Cross-Validation all in one
data <- c(rep(0, 250), rep(1, 250), rep(2, 250))
str(nkfold(data, n = 3, k = c(3, 5, 10), "random"))

# Stratified Repeated 3-5 fold Cross-Validation all in one
# with different types
data <- c(rep(0, 250), rep(1, 250), rep(2, 250))
str(nkfold(data, n = 2, k = c(3, 5), type = c("random", "stratified")))

# Stratified Repeated 3-5 fold Cross-Validation all in one
# with different seeds
data <- c(rep(0, 250), rep(1, 250), rep(2, 250))
str(nkfold(data, n = 2, k = c(3, 5), type = "random", seed = c(0, 10)))

Laurae2/LauraeDS documentation built on May 29, 2019, 2:25 p.m.