Randomly partition data into nearly equal subsets. If
`balanced=TRUE`

the requirement is imposed that the subsets
should as far as possible be balanced with respect to a classifying
factor. The multiple sets are suitable for use for determining the
folds in a cross-validation.

1 |

`cl` |
classifying factor |

`nset` |
number of subsets into which to partition data |

`seed` |
set the seed, if required, in order to obtain reproducible results |

`balanced` |
logical: should subsets be as far as possible balanced with respect to the classifying factor? |

a set of indices that identify the `nset`

subsets

John Maindonald

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ```
foldid <- divideUp(cl=rep(1:3, c(17,14,8)), nset=10)
table(rep(1:3, c(17,14,8)), foldid)
foldid <- divideUp(cl=rep(1:3, c(17,14,8)), nset=10,
balanced=FALSE)
table(rep(1:3, c(17,14,8)), foldid)
## The function is currently defined as
function(cl = rep(1:3, c(7, 4, 8)), nset=2, seed=NULL, balanced=TRUE){
if(!is.null(seed))set.seed(seed)
if(balanced){
ord <- order(cl)
ordcl <- cl[ord]
gp0 <- rep(sample(1:nset), length.out=length(cl))
gp <- unlist(split(gp0,ordcl), function(x)sample(x))
gp[ord] <- gp
} else
gp <- sample(rep(1:nset, length.out=length(cl)))
as.vector(gp)
}
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.