splitStratify: Split by Stratified Sampling
In tpq/exprso: Rapid Deployment of Machine Learning Algorithms

splitStratify builds a training and validation set through a stratified random sampling process. This function utilizes the strata function from the sampling package as well as the cut function from the base package. The latter function provides a means by which to bin continuous data prior to stratified random sampling. We refer the user to the parameter descriptions to learn the specifics of how to apply binning, although the user might find it easier to instead bin annotations beforehand. When applied to an ExprsMulti object, this function stratifies subjects across all classes found in that dataset.

1
2
3

splitStratify(object, percent.include = 67, colBy = NULL,
  bin = rep(FALSE, length(colBy)), breaks = rep(list(NA),
  length(colBy)), ...)

`object`	An `ExprsArray` object to split.
`percent.include`	Specifies the percent of the total number of subjects to include in the training set.
`colBy`	Specifies a vector of column names by which to stratify in addition to class labels annotation. If `colBy = NULL`, random sampling will occur across the class label annotation only. For `splitStratify` only.
`bin`	A logical vector indicating whether to bin the respective `colBy` column using `cut` (e.g., `bin = c(FALSE, TRUE)`). For `splitStratify` only.
`breaks`	A list. Each element of the list should correspond to a `breaks` argument passed to `cut` for the respective `colBy` column. Set an element to `NA` when not binning that `colBy`. For `splitStratify` only.
`...`	For `splitSample`: additional arguments passed along to `sample`. For `splitStratify`: additional arguments passed along to `cut`.