group_vfold_cv: Group V-Fold Cross-Validation

Description Usage Arguments Value Examples

Description

Group V-fold cross-validation creates splits of the data based on some grouping variable (which may have more than a single row associated with it). The function can create as many splits as there are unique values of the grouping variable or it can create a smaller set of splits where more than one value is left out at a time.

Usage

1

Arguments

data

A data frame.

group

A signle character value for the column of the data that will be used to create the splits.

v

The number of partitions of the data set. If let 'NULL', 'v' will be set to the number of unique values in the group.

...

Not currently used.

Value

An tibble with classes 'group_vfold_cv', 'rset', 'tbl_df', 'tbl', and 'data.frame'. The results include a column for the data split objects and an identification variable.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
set.seed(3527)
test_data <- data.frame(id = sort(sample(1:20, size = 80, replace = TRUE)))
test_data$dat <- runif(nrow(test_data))

set.seed(5144)
split_by_id <- group_vfold_cv(test_data, group = "id")

get_id_left_out <- function(x)
  unique(assessment(x)$id)

library(purrr)
table(map_int(split_by_id$splits, get_id_left_out))

set.seed(5144)
split_by_some_id <- group_vfold_cv(test_data, group = "id", v = 7)
held_out <- map(split_by_some_id$splits, get_id_left_out)
table(unlist(held_out))
# number held out per resample:
map_int(held_out, length)

topepo/rsample documentation built on May 4, 2019, 4:25 p.m.