# divideUp: Partition data into mutiple nearly equal subsets In hddplot: Use Known Groups in High-Dimensional Data to Derive Scores for Plots

## Description

Randomly partition data into nearly equal subsets. If `balanced=TRUE` the requirement is imposed that the subsets should as far as possible be balanced with respect to a classifying factor. The multiple sets are suitable for use for determining the folds in a cross-validation.

## Usage

 `1` ```divideUp(cl, nset = 2, seed = NULL, balanced=TRUE) ```

## Arguments

 `cl` classifying factor `nset` number of subsets into which to partition data `seed` set the seed, if required, in order to obtain reproducible results `balanced` logical: should subsets be as far as possible balanced with respect to the classifying factor?

## Value

a set of indices that identify the `nset` subsets

John Maindonald

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20``` ```foldid <- divideUp(cl=rep(1:3, c(17,14,8)), nset=10) table(rep(1:3, c(17,14,8)), foldid) foldid <- divideUp(cl=rep(1:3, c(17,14,8)), nset=10, balanced=FALSE) table(rep(1:3, c(17,14,8)), foldid) ## The function is currently defined as function(cl = rep(1:3, c(7, 4, 8)), nset=2, seed=NULL, balanced=TRUE){ if(!is.null(seed))set.seed(seed) if(balanced){ ord <- order(cl) ordcl <- cl[ord] gp0 <- rep(sample(1:nset), length.out=length(cl)) gp <- unlist(split(gp0,ordcl), function(x)sample(x)) gp[ord] <- gp } else gp <- sample(rep(1:nset, length.out=length(cl))) as.vector(gp) } ```

hddplot documentation built on Sept. 3, 2017, 5:02 p.m.