group_vfold_cv: Group V-Fold Cross-Validation

Description Usage Arguments Value Examples

View source: R/groups.R

Description

Group V-fold cross-validation creates splits of the data based on some grouping variable (which may have more than a single row associated with it). The function can create as many splits as there are unique values of the grouping variable or it can create a smaller set of splits where more than one value is left out at a time.

Usage

1

Arguments

data

A data frame.

group

This could be a single character value or a variable name that corresponds to a variable that exists in the data frame.

v

The number of partitions of the data set. If let NULL, v will be set to the number of unique values in the group.

...

Not currently used.

Value

A tibble with classes group_vfold_cv, rset, tbl_df, tbl, and data.frame. The results include a column for the data split objects and an identification variable.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
set.seed(3527)
test_data <- data.frame(id = sort(sample(1:20, size = 80, replace = TRUE)))
test_data$dat <- runif(nrow(test_data))

set.seed(5144)
split_by_id <- group_vfold_cv(test_data, group = "id")

get_id_left_out <- function(x)
  unique(assessment(x)$id)

library(purrr)
table(map_int(split_by_id$splits, get_id_left_out))

set.seed(5144)
split_by_some_id <- group_vfold_cv(test_data, group = "id", v = 7)
held_out <- map(split_by_some_id$splits, get_id_left_out)
table(unlist(held_out))
# number held out per resample:
map_int(held_out, length)

Example output

Loading required package: tidyr

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 

 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
1 2 3 4 5 6 7 
3 3 3 3 3 3 2 

rsample documentation built on May 8, 2021, 9:06 a.m.