multi_strata | R Documentation |
Creates a stratification vector based on multiple columns of
a data.frame
that can then be passed to the splitting functions.
Currently, the function offers two strategies to create the strata:
"kmeans": k-means cluster analysis on scaled input. (Ordered factors are integer encoded first, unordered factors and character columns are one-hot-encoded.)
"interaction": All combinations (after binning numeric columns into
approximately k
bins).
multi_strata(df, strategy = c("kmeans", "interaction"), k = 3L)
df |
A |
strategy |
A string (either "kmeans" or "interaction") to compute the strata, see description. |
k |
An integer. For |
Factor with strata as levels.
partition()
, create_folds()
y_multi <- data.frame(
A = rep(c(letters[1:4]), each = 20),
B = factor(sample(c(0, 1), 80, replace = TRUE)),
c = rnorm(80)
)
y <- multi_strata(y_multi, k = 3)
folds <- create_folds(y, k = 5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.