mksampsize: Generate sample size information for use with 'gensemble'
In gensemble: Generalized Ensemble Methods

Description Usage Arguments Details Value Author(s) See Also Examples

This translates the sampsize argument to gensemble to a form for use internally.

1	mksampsize(Y, sampsize = NULL, proportion = FALSE)

`Y`	The response vector.
`sampsize`	The desired sample size(s). Can be NULL, a single value, a vector or a list. See the details section for more information.
`proportion`	A `logical` indicating the values in `sampsize` represent proportions.

For regression, sampsize indicates how much of the underlying data should be used in the bagged model. It should either be NULL or a single value. If it is NULL, roughly 80

For classification, the internals of gensemble require a list of each class and the size of the sample from each class. If sampsize is NULL, this list will be built using the levels present in Y, and roughly 80

If Y is a factor, will return a list of each class and the number of data points to sample for that class. Otherwise it will return a single value.

Peter Werner <gensemble.r@gmail.com>

gensemble

#regression
Y <- trees[,3]
#use roughly 80% for each training iteration
mksampsize(Y)
#the same thing using proportion
mksampsize(Y, 0.8, TRUE)

#classification
Y <- iris[,5]
#use rougly 80% of each class
mksampsize(Y)
#specifiy the size of each class in absolute terms
mksampsize(Y, list(setosa=20, versicolor=30, virginica=40))
#use about 70% of each class
mksampsize(Y, 0.7, proportion=TRUE)
#specifiy the proportion for each class
mksampsize(Y, c(0.5, 0.6, 0.7), proportion=TRUE)