cv_index: Create Index for Cross-Validation
In rettopnivek/binclass: Personalized functions for binary classification.

Description Usage Arguments Value Examples

Creates an index with randomized order, useful for splitting data into training and test subsets for cross-validation.

1	cv_index(divisions, n_obs, rng_seed = NULL, x = NULL)

`divisions`	The number of partitions by which to split the data. If less than 1, instead denotes the proportion of data to withold for the test sample.
`n_obs`	The number of observations in the data.
`rng_seed`	An optional integer (or pair of integers, if `x` is provided) to set the random number generator (RNG) state to ensure reproducibility.
`x`	An optional discrete variable of length `n_obs`, a binary variable. If provided, the function will attempt to balance the two instances for the variable for more even representation across folds.

An index from 1 to divisions indicating group membership for each fold. Indices are distributed so that each fold has approximately equal numbers of observations, unless divisions is less than 1. In this case, the index 2 represents observations assigned to the test sample.

# 3 folds for 90 observations
index = cv_index( 3, 90 )
table( index )
# Create unbalanced data
x = c( rep( 1, 10 ), rep( 0, 90 ) )
index = cv_index( 10, length(x), x = x )
table( index[ x == 1 ] )
table( index[ x == 0 ] )
# Withold 30% for test sample
index = cv_index( .3, 100 )
table( index )
# Create unbalanced data
x = c( rep( 1, 10 ), rep( 0, 90 ) )
index = cv_index( .3, length(x), x = x )
table( index[ x == 1 ] )
table( index[ x == 0 ] )