Description Usage Arguments Details Value Author(s) References Examples
A series of test/training partitions are created using
createDataPartition
while createResample
creates one or
more bootstrap samples. createFolds
splits the data into
k
groups while createTimeSlices
creates cross-validation
sample information to be used with time series data.
1 2 3 4 5 6 7 8 9 | createDataPartition(y,
times = 1,
p = 0.5,
list = TRUE,
groups = min(5, length(y)))
createResample(y, times = 10, list = TRUE)
createFolds(y, k = 10, list = TRUE, returnTrain = FALSE)
createMultiFolds(y, k = 10, times = 5)
createTimeSlices(y, initialWindow, horizon = 1, fixedWindow = TRUE)
|
y |
a vector of outcomes. For |
times |
the number of partitions to create |
p |
the percentage of data that goes to training |
list |
logical - should the results be in a list ( |
groups |
for numeric |
k |
an integer for the number of folds. |
returnTrain |
a logical. When true, the values returned are the
sample positions corresponding to the data used during
training. This argument only works in conjunction with |
initialWindow |
The initial number of consecutive values in each training set sample |
horizon |
The number of consecutive values in test set sample |
fixedWindow |
A logical: if |
For bootstrap samples, simple random sampling is used.
For other data splitting, the random sampling is done within the
levels of y
when y
is a factor in an attempt to balance
the class distributions within the splits.
For numeric y
, the sample is split into groups sections based
on percentiles and sampling is done within these subgroups. For
createDataPartition
, the number of percentiles is set via the
groups
argument. For createFolds
and createMultiFolds
,
the number of groups is set dynamically based on the sample size and k
.
For smaller samples sizes, these two functions may not do stratified
splitting and, at most, will split the data into quartiles.
Also, for createDataPartition
, very small class sizes (<= 3) the
classes may not show up in both the training and test data
For multiple k-fold cross-validation, completely independent folds are created.
The names of the list objects will denote the fold membership using the pattern
"Foldi.Repj" meaning the ith section (of k) of the jth cross-validation set
(of times
). Note that this function calls createFolds
with
list = TRUE
and returnTrain = TRUE
.
Hyndman and Athanasopoulos (2013)) discuss rolling forecasting origin< techniques that move the training and test sets in time. createTimeSlices
can create the indices for this type of splitting.
A list or matrix of row position integers corresponding to the training data
Max Kuhn, createTimeSlices
by Tony Cooper
http://caret.r-forge.r-project.org/splitting.html
Hyndman and Athanasopoulos (2013), Forecasting: principles and practice. https://www.otexts.org/fpp
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | data(oil)
createDataPartition(oilType, 2)
x <- rgamma(50, 3, .5)
inA <- createDataPartition(x, list = FALSE)
plot(density(x[inA]))
rug(x[inA])
points(density(x[-inA]), type = "l", col = 4)
rug(x[-inA], col = 4)
createResample(oilType, 2)
createFolds(oilType, 10)
createFolds(oilType, 5, FALSE)
createFolds(rnorm(21))
createTimeSlices(1:9, 5, 1, fixedWindow = FALSE)
createTimeSlices(1:9, 5, 1, fixedWindow = TRUE)
createTimeSlices(1:9, 5, 3, fixedWindow = TRUE)
createTimeSlices(1:9, 5, 3, fixedWindow = FALSE)
|
Loading required package: lattice
Loading required package: ggplot2
$Resample1
[1] 1 10 11 12 13 14 15 16 22 23 24 25 26 27 30 31 36 37 39 40 41 47 48 50 52
[26] 53 54 55 59 60 61 62 66 70 72 73 74 75 77 78 79 80 81 83 86 89 90 91 93 96
$Resample2
[1] 3 4 9 10 11 13 14 15 16 19 22 23 24 26 27 28 29 32 34 37 38 39 40 42 44
[26] 45 48 50 51 54 55 56 57 60 61 63 64 65 68 69 70 76 77 80 81 85 87 88 91 94
$Resample1
[1] 1 1 2 2 4 5 7 7 7 9 10 10 12 13 13 13 14 15 17 18 19 20 21 23 24
[26] 24 25 29 29 32 33 34 34 35 36 36 38 38 40 40 44 46 49 50 50 50 51 52 53 53
[51] 54 55 56 56 57 58 59 61 61 62 63 64 66 66 66 67 69 70 71 71 71 74 74 74 75
[76] 76 76 76 76 77 78 78 78 79 80 81 81 82 85 87 89 91 92 92 94 95
$Resample2
[1] 2 4 5 5 7 8 8 8 9 10 11 11 12 13 14 15 15 16 16 17 19 20 21 22 22
[26] 23 27 28 29 32 33 34 34 34 34 35 35 37 37 38 39 40 41 42 42 43 44 45 46 48
[51] 48 48 51 52 56 57 58 59 60 61 62 62 62 62 62 64 66 66 67 68 68 69 70 70 71
[76] 72 73 74 75 76 77 77 83 87 87 89 90 90 91 92 93 93 94 94 94 94
$Fold01
[1] 12 14 29 35 36 44 53 59 68 76
$Fold02
[1] 1 9 15 22 54 57 80 96
$Fold03
[1] 18 27 46 60 63 64 72 77 84 89
$Fold04
[1] 17 33 37 49 52 67 87 95
$Fold05
[1] 3 7 20 24 40 41 42 47 86 92 94
$Fold06
[1] 4 11 23 32 48 56 61 71 73 88 90
$Fold07
[1] 10 16 19 34 39 51 70 81 91
$Fold08
[1] 5 6 13 30 31 43 50 62 75 83 93
$Fold09
[1] 8 25 28 66 74 79 82
$Fold10
[1] 2 21 26 38 45 55 58 65 69 78 85
[1] 4 2 5 3 5 4 2 5 5 4 4 1 2 2 1 2 2 5 5 2 1 4 1 5 4 5 1 4 5 1 5 5 2 4 1 2 3 4
[39] 5 2 3 1 5 5 1 1 1 4 2 2 3 1 2 4 5 3 5 1 5 3 2 4 4 4 1 3 4 1 3 5 1 3 3 1 1 3
[77] 2 5 3 3 1 2 4 4 3 4 1 3 2 1 2 5 4 3 2 3
$Fold01
[1] 3 7
$Fold02
[1] 8 10
$Fold03
[1] 16 20
$Fold04
[1] 2 13 17
$Fold05
[1] 4 12
$Fold06
[1] 11 21
$Fold07
[1] 5 19
$Fold08
[1] 1 15
$Fold09
[1] 6 9
$Fold10
[1] 14 18
$train
$train$Training5
[1] 1 2 3 4 5
$train$Training6
[1] 1 2 3 4 5 6
$train$Training7
[1] 1 2 3 4 5 6 7
$train$Training8
[1] 1 2 3 4 5 6 7 8
$test
$test$Testing5
[1] 6
$test$Testing6
[1] 7
$test$Testing7
[1] 8
$test$Testing8
[1] 9
$train
$train$Training5
[1] 1 2 3 4 5
$train$Training6
[1] 2 3 4 5 6
$train$Training7
[1] 3 4 5 6 7
$train$Training8
[1] 4 5 6 7 8
$test
$test$Testing5
[1] 6
$test$Testing6
[1] 7
$test$Testing7
[1] 8
$test$Testing8
[1] 9
$train
$train$Training5
[1] 1 2 3 4 5
$train$Training6
[1] 2 3 4 5 6
$test
$test$Testing5
[1] 6 7 8
$test$Testing6
[1] 7 8 9
$train
$train$Training5
[1] 1 2 3 4 5
$train$Training6
[1] 1 2 3 4 5 6
$test
$test$Testing5
[1] 6 7 8
$test$Testing6
[1] 7 8 9
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.