group_vfold_cv | R Documentation |
Group V-fold cross-validation creates splits of the data based on some grouping variable (which may have more than a single row associated with it). The function can create as many splits as there are unique values of the grouping variable or it can create a smaller set of splits where more than one group is left out at a time. A common use of this kind of resampling is when you have repeated measures of the same subject.
group_vfold_cv(
data,
group = NULL,
v = NULL,
repeats = 1,
balance = c("groups", "observations"),
...,
strata = NULL,
pool = 0.1
)
data |
A data frame. |
group |
A variable in |
v |
The number of partitions of the data set. If left as |
repeats |
The number of times to repeat the V-fold partitioning. |
balance |
If |
... |
These dots are for future extensions and must be empty. |
strata |
A variable in |
pool |
A proportion of data used to determine if a particular group is too small and should be pooled into another group. We do not recommend decreasing this argument below its default of 0.1 because of the dangers of stratifying groups that are too small. |
A tibble with classes group_vfold_cv
,
rset
, tbl_df
, tbl
, and data.frame
.
The results include a column for the data split objects and an
identification variable.
data(ames, package = "modeldata")
set.seed(123)
group_vfold_cv(ames, group = Neighborhood, v = 5)
group_vfold_cv(
ames,
group = Neighborhood,
v = 5,
balance = "observations"
)
group_vfold_cv(ames, group = Neighborhood, v = 5, repeats = 2)
# Leave-one-group-out CV
group_vfold_cv(ames, group = Neighborhood)
library(dplyr)
data(Sacramento, package = "modeldata")
city_strata <- Sacramento %>%
group_by(city) %>%
summarize(strata = mean(price)) %>%
summarize(city = city,
strata = cut(strata, quantile(strata), include.lowest = TRUE))
sacramento_data <- Sacramento %>%
full_join(city_strata, by = "city")
group_vfold_cv(sacramento_data, city, strata = strata)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.