make_groups: Make groupings for grouped rsplits

View source: R/make_groups.R

make_groupsR Documentation

Make groupings for grouped rsplits

Description

This function powers grouped resampling by splitting the data based upon a grouping variable and returning the assessment set indices for each split.

Usage

make_groups(
  data,
  group,
  v,
  balance = c("groups", "observations", "prop"),
  strata = NULL,
  ...
)

Arguments

data

A data frame.

group

A variable in data (single character or name) used for grouping observations with the same value to either the analysis or assessment set within a fold.

v

The number of partitions of the data set.

balance

If v is less than the number of unique groups, how should groups be combined into folds? Should be one of "groups", "observations", "prop".

strata

A variable in data (single character or name) used to conduct stratified sampling. When not NULL, each resample is created within the stratification variable. Numeric strata are binned into quartiles.

...

Arguments passed to balance functions.

Details

Not all balance options are accepted – or make sense – for all resampling functions. For instance, balance = "prop" assigns groups to folds at random, meaning that any given observation is not guaranteed to be in one (and only one) assessment set. That means balance = "prop" can't be used with group_vfold_cv(), and so isn't an option available for that function.

Similarly, group_mc_cv() and its derivatives don't assign data to one (and only one) assessment set, but rather allow each observation to be in an assessment set zero-or-more times. As a result, those functions don't have a balance argument, and under the hood always specify balance = "prop" when they call make_groups().


tidymodels/rsample documentation built on Sept. 29, 2024, 10:48 p.m.