holdout_n: Generate test/train splits

Description Usage Arguments Details Examples

View source: R/holdout.R

Description

Functions to generate test/train splits. The function holdout_n() generates splits such that size observations are in the test set and n - size is in the training set. The function holdout_frac() splits the data such that frac proportion of elements are in the test set and 1 - frac proportion are in the training set.

Usage

1
2
3
4
5
6
7
holdout_n(n, times = 1L, size = 1L, shuffle = TRUE, prob = NULL)

holdout_frac(n, times = 1L, frac = 0.3, shuffle = TRUE, prob = NULL)

crossv_mc(n, times = 25, frac = 0.3, prob = NULL)

holdout_idx(n, train = NULL, test = NULL)

Arguments

n

A positive, scalar integer representing the number of observations (items to choose from).

times

A positive, scalar integer representing the number of bootstrap samples to draw.

size

A scalar integer representing the number of elements in the test set.

shuffle

A logical scalar indicating whether to shuffle the items prior to splitting into test/train sets. This should be used whenever times > 1.

prob

A numeric vector with observation-specific probabilities that an observation is the test set. If NULL, all observations have equal probabilities.

frac

A numeric scalar between 0 and 1 representing the proportion of items in the test set.

test, train

A list of integer vectors, each containing the indexes in the test (train) splits. If test (train) NULL, then the index values will be set to the complement of the train (test) indexes.

Details

Either holdout_frac() and holdout_n(), when combined with shuffle = TRUE and times > 1 can be used to generate test/train splits using Monte Carlo cross-validation. The function crossv_mc() is a convenience function for Monte Carlo cross-validation.

The function holdout_idx() generates test/train splits from manually specified indexes.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Test/train splits using the number of observations
holdout_n(10, times = 5, size = 2)

# Test/train splits without shuffling
holdout_n(10, times = 1, size = 2, shuffle = FALSE)

# Test/train splits using the fraction of observation
holdout_frac(10, frac = 0.3, times = 3)

# Monte-Carlo cross-validation
crossv_mc(10, frac = 0.3, times = 3)

# Manual test/train splits
holdout_idx(10, test = list(1:2, 2:3, 4:5))

jrnold/ramsleep documentation built on May 29, 2019, 11:43 a.m.