create_folds | R Documentation |
This function provides a list of row indices used for k-fold cross-validation (basic, stratified, grouped, or blocked). Repeated fold creation is supported as well. By default, in-sample indices are returned.
create_folds(
y,
k = 5L,
type = c("stratified", "basic", "grouped", "blocked"),
n_bins = 10L,
m_rep = 1L,
use_names = TRUE,
invert = FALSE,
shuffle = FALSE,
seed = NULL
)
y |
Either the variable used for "stratification" or "grouped" splits. For other types of splits, any vector of the same length as the data intended to split. |
k |
Number of folds. |
type |
Split type. One of "stratified" (default), "basic", "grouped", "blocked". |
n_bins |
Approximate numbers of bins for numeric |
m_rep |
How many times should the data be split into k folds? Default is 1, i.e., no repetitions. |
use_names |
Should folds be named? Default is |
invert |
Set to |
shuffle |
Should row indices be randomly shuffled within folds?
Default is |
seed |
Integer random seed. |
By default, the function uses stratified splitting. This will balance the folds
regarding the distribution of the input vector y
.
(Numeric input is first binned into n_bins
quantile groups.)
If type = "grouped"
, groups specified by y
are kept together
when splitting. This is relevant for clustered or panel data.
In contrast to basic splitting, type = "blocked"
does not sample
indices at random, but rather keeps them in sequential groups.
If invert = FALSE
(the default), a list with in-sample row indices.
If invert = TRUE
, a list with out-of-sample indices.
partition()
, create_timefolds()
y <- rep(c(letters[1:4]), each = 5)
create_folds(y)
create_folds(y, k = 2)
create_folds(y, k = 2, m_rep = 2)
create_folds(y, k = 3, type = "blocked")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.