Description Usage Arguments Details Value Author(s) See Also Examples
This translates the sampsize
argument to gensemble
to a form for use internally.
1 | mksampsize(Y, sampsize = NULL, proportion = FALSE)
|
Y |
The response vector. |
sampsize |
The desired sample size(s). Can be NULL, a single value, a vector or a list. See the details section for more information. |
proportion |
A |
For regression, sampsize
indicates how much of the underlying data should be used in the bagged model. It should either be NULL
or a single value. If it is NULL
, roughly 80
For classification, the internals of gensemble
require a list of each class and the size of the sample from each class. If sampsize
is NULL
, this list will be built using the levels present in Y
, and roughly 80
If Y
is a factor, will return a list of each class and the number of data points to sample for that class. Otherwise it will return a single value.
Peter Werner <gensemble.r@gmail.com>
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | #regression
Y <- trees[,3]
#use roughly 80% for each training iteration
mksampsize(Y)
#the same thing using proportion
mksampsize(Y, 0.8, TRUE)
#classification
Y <- iris[,5]
#use rougly 80% of each class
mksampsize(Y)
#specifiy the size of each class in absolute terms
mksampsize(Y, list(setosa=20, versicolor=30, virginica=40))
#use about 70% of each class
mksampsize(Y, 0.7, proportion=TRUE)
#specifiy the proportion for each class
mksampsize(Y, c(0.5, 0.6, 0.7), proportion=TRUE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.