holdout_frac: Generate K cross-validation test-training pairs
In jrnold/resamplr: Resampling methods

Description Usage Arguments Value Methods (by class) See Also Examples

holdout_frac splits the data so that proportion size is in test set and 1 - size is in the training set. Likewise, holdout_n splits the data so that size elements are in the test set and the remainder are in the training set.

holdout_frac(data, ...)

## S3 method for class 'data.frame'
holdout_frac(data, size = 0.3, K = 1L,
  shuffle = TRUE, prob = NULL, ...)

## S3 method for class 'grouped_df'
holdout_frac(data, size = 0.3, K = 1L,
  shuffle = TRUE, stratify = FALSE, prob = NULL, ...)

holdout_n(data, ...)

## S3 method for class 'data.frame'
holdout_n(data, size = 1L, K = 1L, shuffle = TRUE,
  prob = NULL, ...)

## S3 method for class 'grouped_df'
holdout_n(data, size = 1L, K = 1L, shuffle = TRUE,
  stratify = FALSE, prob = NULL, ...)

`data`	A data frame
`...`	Arguments passed to methods.
`size`	For `holdout_n`, the number of elements in the test set. For `holdout_frac`, the fraction of elements in test set.
`K`	Number of test/train splits to generate.
`shuffle`	If `TRUE`, the observations are randomly assigned to the test and training sets. If `FALSE`, then the first `size` elements are assigned to the test set, and the remainder of the observations are assigned to the training set.
`prob`	Probability weight that an element is in the `test` set. If non-`NULL` this is numeric vector with `nrow(data)` row weights if `data` is a data frame or a grouped data frame and `stratify = TRUE`, or `n_groups(data)` group weights if `data` is a grouped data frame and `stratify = FALSE`.
`stratify`	If `TRUE`, then test-train splits are within each code group, so that the final test and train subsets have approximately equal proportions of each group. If `FALSE`, the the test-train splits splits groups into the testing and training sets.

A data frame with K rows and the following columns:

sample: A list of resample objects. Training sets.
.id: An integer vector of identifiers.

data.frame: Split rows in a data frame into test and training data sets.
grouped_df: Splits within each group of a grouped data frame into test and training sets if stratify = FALSE. This ensures that the test and training sets will have approximately equal proportions of each group in the training and test sets. If stratify = TRUE, then the groups are split into test and training sets.
data.frame: Split rows in a data frame into test and training data sets.
grouped_df: Splits within each group of a grouped data frame into test and training sets if stratify = FALSE. This ensures that the test and training sets will have approximately equal proportions of each group in the training and test sets. If stratify = TRUE, then the groups are split into test and training sets.

This function is similar to the modelr function crossv_mc, but with more features.

# Example originally from modelr::crossv_mc
library("purrr")
library("dplyr")

# holdout three obs, repeat 10 times
cv1 <- holdout_n(mtcars, size = 3, K = 10)
models <- map(cv1$train, ~ lm(mpg ~ wt, data = .))
summary(map2_dbl(models, cv1$test, modelr::rmse))

# holdout two groups at a time in the test set
# repeat four times.
cv2 <- holdout_n(group_by(mtcars, cyl), size = 2, K = 4)
models <- map(cv2$train, ~ lm(mpg ~ wt, data = .))
summary(map2_dbl(models, cv2$test, modelr::rmse))

# stratified holdout
# holdout 1 obs each from each group. repeat 5 times.
cv3 <- holdout_n(group_by(mtcars, am), size = 1, K = 5, stratified = TRUE)
models <- map(cv3$train, ~ lm(mpg ~ wt, data = .))
summary(map2_dbl(models, cv3$test, modelr::rmse))

# Holdout fraction of the data

# holdout 30% of observations, repeat 10 times
cv4 <- holdout_frac(mtcars, size = 0.3, K = 10)
models <- map(cv4$train, ~ lm(mpg ~ wt, data = .))
summary(map2_dbl(models, cv4$test, modelr::rmse))

# holdout 30% of groups at a time in the test set
cv5 <- holdout_frac(group_by(mtcars, cyl), size = 0.3, K = 10)
models <- map(cv5$train, ~ lm(mpg ~ wt, data = .))
summary(map2_dbl(models, cv5$test, modelr::rmse))

# stratified holdout
# holdout 30% of obs within each group.
cv6 <- holdout_frac(group_by(mtcars, am), size = 0.3, K = 10, stratified = TRUE)
models <- map(cv6$train, ~ lm(mpg ~ wt, data = .))
summary(map2_dbl(models, cv6$test, modelr::rmse))