mksampsize: Generate sample size information for use with 'gensemble'

Description Usage Arguments Details Value Author(s) See Also Examples

Description

This translates the sampsize argument to gensemble to a form for use internally.

Usage

1
mksampsize(Y, sampsize = NULL, proportion = FALSE)

Arguments

Y

The response vector.

sampsize

The desired sample size(s). Can be NULL, a single value, a vector or a list. See the details section for more information.

proportion

A logical indicating the values in sampsize represent proportions.

Details

For regression, sampsize indicates how much of the underlying data should be used in the bagged model. It should either be NULL or a single value. If it is NULL, roughly 80

For classification, the internals of gensemble require a list of each class and the size of the sample from each class. If sampsize is NULL, this list will be built using the levels present in Y, and roughly 80

Value

If Y is a factor, will return a list of each class and the number of data points to sample for that class. Otherwise it will return a single value.

Author(s)

Peter Werner <gensemble.r@gmail.com>

See Also

gensemble

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#regression
Y <- trees[,3]
#use roughly 80% for each training iteration
mksampsize(Y)
#the same thing using proportion
mksampsize(Y, 0.8, TRUE)

#classification
Y <- iris[,5]
#use rougly 80% of each class
mksampsize(Y)
#specifiy the size of each class in absolute terms
mksampsize(Y, list(setosa=20, versicolor=30, virginica=40))
#use about 70% of each class
mksampsize(Y, 0.7, proportion=TRUE)
#specifiy the proportion for each class
mksampsize(Y, c(0.5, 0.6, 0.7), proportion=TRUE)

gensemble documentation built on May 2, 2019, 1:02 p.m.