holdout_frac: Generate K cross-validation test-training pairs

Description Usage Arguments Value Methods (by class) See Also Examples

Description

holdout_frac splits the data so that proportion size is in test set and 1 - size is in the training set. Likewise, holdout_n splits the data so that size elements are in the test set and the remainder are in the training set.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
holdout_frac(data, ...)

## S3 method for class 'data.frame'
holdout_frac(data, size = 0.3, K = 1L,
  shuffle = TRUE, prob = NULL, ...)

## S3 method for class 'grouped_df'
holdout_frac(data, size = 0.3, K = 1L,
  shuffle = TRUE, stratify = FALSE, prob = NULL, ...)

holdout_n(data, ...)

## S3 method for class 'data.frame'
holdout_n(data, size = 1L, K = 1L, shuffle = TRUE,
  prob = NULL, ...)

## S3 method for class 'grouped_df'
holdout_n(data, size = 1L, K = 1L, shuffle = TRUE,
  stratify = FALSE, prob = NULL, ...)

Arguments

data

A data frame

...

Arguments passed to methods.

size

For holdout_n, the number of elements in the test set. For holdout_frac, the fraction of elements in test set.

K

Number of test/train splits to generate.

shuffle

If TRUE, the observations are randomly assigned to the test and training sets. If FALSE, then the first size elements are assigned to the test set, and the remainder of the observations are assigned to the training set.

prob

Probability weight that an element is in the test set. If non-NULL this is numeric vector with nrow(data) row weights if data is a data frame or a grouped data frame and stratify = TRUE, or n_groups(data) group weights if data is a grouped data frame and stratify = FALSE.

stratify

If TRUE, then test-train splits are within each code group, so that the final test and train subsets have approximately equal proportions of each group. If FALSE, the the test-train splits splits groups into the testing and training sets.

Value

A data frame with K rows and the following columns:

sample

A list of resample objects. Training sets.

.id

An integer vector of identifiers.

Methods (by class)

See Also

This function is similar to the modelr function crossv_mc, but with more features.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Example originally from modelr::crossv_mc
library("purrr")
library("dplyr")

# holdout three obs, repeat 10 times
cv1 <- holdout_n(mtcars, size = 3, K = 10)
models <- map(cv1$train, ~ lm(mpg ~ wt, data = .))
summary(map2_dbl(models, cv1$test, modelr::rmse))

# holdout two groups at a time in the test set
# repeat four times.
cv2 <- holdout_n(group_by(mtcars, cyl), size = 2, K = 4)
models <- map(cv2$train, ~ lm(mpg ~ wt, data = .))
summary(map2_dbl(models, cv2$test, modelr::rmse))

# stratified holdout
# holdout 1 obs each from each group. repeat 5 times.
cv3 <- holdout_n(group_by(mtcars, am), size = 1, K = 5, stratified = TRUE)
models <- map(cv3$train, ~ lm(mpg ~ wt, data = .))
summary(map2_dbl(models, cv3$test, modelr::rmse))

# Holdout fraction of the data

# holdout 30% of observations, repeat 10 times
cv4 <- holdout_frac(mtcars, size = 0.3, K = 10)
models <- map(cv4$train, ~ lm(mpg ~ wt, data = .))
summary(map2_dbl(models, cv4$test, modelr::rmse))

# holdout 30% of groups at a time in the test set
cv5 <- holdout_frac(group_by(mtcars, cyl), size = 0.3, K = 10)
models <- map(cv5$train, ~ lm(mpg ~ wt, data = .))
summary(map2_dbl(models, cv5$test, modelr::rmse))

# stratified holdout
# holdout 30% of obs within each group.
cv6 <- holdout_frac(group_by(mtcars, am), size = 0.3, K = 10, stratified = TRUE)
models <- map(cv6$train, ~ lm(mpg ~ wt, data = .))
summary(map2_dbl(models, cv6$test, modelr::rmse))

jrnold/resamplr documentation built on May 20, 2019, 1:05 a.m.