Description Usage Arguments Details Value Author(s) See Also Examples
This translates the sampsize argument to gensemble to a form for use internally.
1 | mksampsize(Y, sampsize = NULL, proportion = FALSE)
|
Y |
The response vector. |
sampsize |
The desired sample size(s). Can be NULL, a single value, a vector or a list. See the details section for more information. |
proportion |
A |
For regression, sampsize indicates how much of the underlying data should be used in the bagged model. It should either be NULL or a single value. If it is NULL, roughly 80
For classification, the internals of gensemble require a list of each class and the size of the sample from each class. If sampsize is NULL, this list will be built using the levels present in Y, and roughly 80
If Y is a factor, will return a list of each class and the number of data points to sample for that class. Otherwise it will return a single value.
Peter Werner <gensemble.r@gmail.com>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #regression
Y <- trees[,3]
#use roughly 80% for each training iteration
mksampsize(Y)
#the same thing using proportion
mksampsize(Y, 0.8, TRUE)
#classification
Y <- iris[,5]
#use rougly 80% of each class
mksampsize(Y)
#specifiy the size of each class in absolute terms
mksampsize(Y, list(setosa=20, versicolor=30, virginica=40))
#use about 70% of each class
mksampsize(Y, 0.7, proportion=TRUE)
#specifiy the proportion for each class
mksampsize(Y, c(0.5, 0.6, 0.7), proportion=TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.