createDataPartition | R Documentation |

A series of test/training partitions are created using
`createDataPartition`

while `createResample`

creates one or more
bootstrap samples. `createFolds`

splits the data into `k`

groups
while `createTimeSlices`

creates cross-validation split for series data.
`groupKFold`

splits the data based on a grouping factor.

```
createDataPartition(
y,
times = 1,
p = 0.5,
list = TRUE,
groups = min(5, length(y))
)
createFolds(y, k = 10, list = TRUE, returnTrain = FALSE)
createMultiFolds(y, k = 10, times = 5)
createTimeSlices(y, initialWindow, horizon = 1, fixedWindow = TRUE, skip = 0)
groupKFold(group, k = length(unique(group)))
createResample(y, times = 10, list = TRUE)
```

`y` |
a vector of outcomes. For |

`times` |
the number of partitions to create |

`p` |
the percentage of data that goes to training |

`list` |
logical - should the results be in a list ( |

`groups` |
for numeric |

`k` |
an integer for the number of folds. |

`returnTrain` |
a logical. When true, the values returned are the sample
positions corresponding to the data used during training. This argument
only works in conjunction with |

`initialWindow` |
The initial number of consecutive values in each training set sample |

`horizon` |
the number of consecutive values in test set sample |

`fixedWindow` |
logical, if |

`skip` |
integer, how many (if any) resamples to skip to thin the total amount |

`group` |
a vector of groups whose length matches the number of rows in the overall data set. |

For bootstrap samples, simple random sampling is used.

For other data splitting, the random sampling is done within the levels of
`y`

when `y`

is a factor in an attempt to balance the class
distributions within the splits.

For numeric `y`

, the sample is split into groups sections based on
percentiles and sampling is done within these subgroups. For
`createDataPartition`

, the number of percentiles is set via the
`groups`

argument. For `createFolds`

and `createMultiFolds`

,
the number of groups is set dynamically based on the sample size and
`k`

. For smaller samples sizes, these two functions may not do
stratified splitting and, at most, will split the data into quartiles.

Also, for `createDataPartition`

, very small class sizes (<= 3) the
classes may not show up in both the training and test data

For multiple k-fold cross-validation, completely independent folds are
created. The names of the list objects will denote the fold membership
using the pattern "Foldi.Repj" meaning the ith section (of k) of the jth
cross-validation set (of `times`

). Note that this function calls
`createFolds`

with `list = TRUE`

and `returnTrain = TRUE`

.

Hyndman and Athanasopoulos (2013)) discuss rolling forecasting origin
techniques that move the training and test sets in time.
`createTimeSlices`

can create the indices for this type of splitting.

For Group k-fold cross-validation, the data are split such that no group
is contained in both the modeling and holdout sets. One or more group
could be left out, depending on the value of `k`

.

A list or matrix of row position integers corresponding to the
training data. For `createTimeSlices`

subsamples are named by the end
index of each training subsample.

Max Kuhn, `createTimeSlices`

by Tony Cooper

http://topepo.github.io/caret/data-splitting.html

Hyndman and Athanasopoulos (2013), Forecasting: principles and practice. https://otexts.com/fpp2/

```
data(oil)
createDataPartition(oilType, 2)
x <- rgamma(50, 3, .5)
inA <- createDataPartition(x, list = FALSE)
plot(density(x[inA]))
rug(x[inA])
points(density(x[-inA]), type = "l", col = 4)
rug(x[-inA], col = 4)
createResample(oilType, 2)
createFolds(oilType, 10)
createFolds(oilType, 5, FALSE)
createFolds(rnorm(21))
createTimeSlices(1:9, 5, 1, fixedWindow = FALSE)
createTimeSlices(1:9, 5, 1, fixedWindow = TRUE)
createTimeSlices(1:9, 5, 3, fixedWindow = TRUE)
createTimeSlices(1:9, 5, 3, fixedWindow = FALSE)
createTimeSlices(1:15, 5, 3)
createTimeSlices(1:15, 5, 3, skip = 2)
createTimeSlices(1:15, 5, 3, skip = 3)
set.seed(131)
groups <- sort(sample(letters[1:4], size = 20, replace = TRUE))
table(groups)
folds <- groupKFold(groups)
lapply(folds, function(x, y) table(y[x]), y = groups)
```

caret documentation built on March 31, 2023, 9:49 p.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.