cv_index: Create Index for Cross-Validation

Description Usage Arguments Value Examples

Description

Creates an index with randomized order, useful for splitting data into training and test subsets for cross-validation.

Usage

1
cv_index(divisions, n_obs, rng_seed = NULL, x = NULL)

Arguments

divisions

The number of partitions by which to split the data. If less than 1, instead denotes the proportion of data to withold for the test sample.

n_obs

The number of observations in the data.

rng_seed

An optional integer (or pair of integers, if x is provided) to set the random number generator (RNG) state to ensure reproducibility.

x

An optional discrete variable of length n_obs, a binary variable. If provided, the function will attempt to balance the two instances for the variable for more even representation across folds.

Value

An index from 1 to divisions indicating group membership for each fold. Indices are distributed so that each fold has approximately equal numbers of observations, unless divisions is less than 1. In this case, the index 2 represents observations assigned to the test sample.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# 3 folds for 90 observations
index = cv_index( 3, 90 )
table( index )
# Create unbalanced data
x = c( rep( 1, 10 ), rep( 0, 90 ) )
index = cv_index( 10, length(x), x = x )
table( index[ x == 1 ] )
table( index[ x == 0 ] )
# Withold 30% for test sample
index = cv_index( .3, 100 )
table( index )
# Create unbalanced data
x = c( rep( 1, 10 ), rep( 0, 90 ) )
index = cv_index( .3, length(x), x = x )
table( index[ x == 1 ] )
table( index[ x == 0 ] )

rettopnivek/binclass documentation built on May 13, 2019, 4:46 p.m.