balancedSplit: Split a dataset into training and testing sets, balancing a...
In CrossValidate: Classes and Methods for Cross Validation of "Class Prediction" Algorithms

View source: R/xv00-utility.R

balancedSplit

R Documentation

Split a dataset into training and testing sets, balancing a factor

Description

When performing cross-validation on a dataset, it often becomes necessary to split the data into training and test sets that are balanced for a factor. This function implements such a balanced split.

Usage

balancedSplit(fac, size)

Arguments

`fac`	A factor that should be balanced between the two subsets.
`size`	A number between 0 and 1 indicating the fraction of the dataset to be used for training.

Details

This function randomly samples the same fraction of items from each level of a factor to include in a training set. In most cases, this will be a binary factor (and might even be the outcome that one wants to predict). However, the implementation works for factors with an arbitrary number of levels.

Value

Returns a logical vector with length equal to the length of fac. TRUE values designate samples selected for the training set.

Author(s)

Kevin R. Coombes <krc@silicovore.com>

Examples

nFeatures <- 40
nSamples <- 2*10
dataset <- matrix(rnorm(nSamples*nFeatures), ncol=nSamples)
groups <- factor(rep(c("A", "B"), each=10))
balancedSplit(dataset, groups)

CrossValidate documentation built on April 11, 2025, 3:08 p.m.