gensemble: Generalized ensemble methods

Description Usage Arguments Details Value Wrapping the model function Note Author(s) References See Also Examples

Description

Gensemble is a generalisation of random forests allowing allowing arbitrary use of underlying models.

Usage

1
2
3
gensemble(abm, X, Y, sampsize = NULL, sampsize_prop = FALSE, nmods = 100, 
	perturb_val = 0.1, Xtest = NULL, Ytest = NULL, do.trace = TRUE, 
	stepsize = 10)

Arguments

abm

An object of type AbstractModel-class

X

A data frame or matrix of predictors

Y

A response vector. If Y is a factor classification is assumed, otherwise regression. See the notes for more details.

sampsize

A list or vector of sample sizes used when creating a bagged sample. If not supplied, all input data will be used to build the models. See mksampsize for details on how this will be interpreted.

sampsize_prop

A boolean indictating the values in samplesize should be interpreted as proportions.

nmods

How many models to build.

perturb_val

The proportion of input data to perturb.

Xtest

Optional test set of X values.

Ytest

Optional test set of Y values.

do.trace

If TRUE, summary statistics will be printed. The information printed is as follows:

  1. For classification, the per-class accuracy is printed, along with the proportion of training points not yet included in any model, and the total accuracy.

  2. For regression, the variance, mse, scaled mse, estimated R^2 and proportion of training points not yet included in any model.

stepsize

If do.trace is TRUE, specifies how often to print trace information. For example, a value of 10 will print every 10 models. A value of 1 will print after every model.

Details

This is a general implmentation of bagging. It enables (in theory) any underling modelling/learning algorithm to be used, via the AbstractModel-class.

Value

An object of class gensemble-class uncode gensemble-class.

Wrapping the model function

The first argument to gensemble is an instance of an AbstractModel-class. You will need to wrap the model you wish to use in this class before using gensemble.

First off, you should probably make sure the model function works for the data you will pass to gensemble. For example let's say we are using ksvm from kernlab, on the iris data set. You might have something that looks like this:

1
2
3
4
5
6
7
library(kernlab)
X <- iris[,1:4]
Y <- iris[,5]
cnt <- nrow(iris)
samp <- sample(1:cnt, cnt * 0.7)
mod <- ksvm(as.matrix(X[samp,]), Y[samp], type="C-svc", C=1, epsilon=0.1)
preds <- predict(mod, X[-samp,])

We can wrap this up in an instance of AbstractModel-class as follows:

1
2
abm <- ab.create(model.call="ksvm", model.args=list(type="C-svc", C=1, 
	epsilon=0.1), xtrans=as.matrix)

We now pass the arguments we would pass to ksvm via the model.args argument to ab.create. It is simply list of the arguments and their values.

Note we define the X transform to be as.matrix, which means the X values passed to ksvm by AbstractModel will first be run through as.matrix.

We can check this is working as expected using ab.model and ab.predict.

1
2
mod <- ab.model(abm, X[samp,], Y[samp])
preds <- ab.predict(abm, mod, X[-samp,])

Classification with gensemble requires a probability matrix to be returned by the underlying model. We will need to pass some extra arguments to ksvm to make sure this is present.

1
2
3
abm <- ab.create(model.call="ksvm", model.args=list(prob.model=TRUE, 
	type="C-svc", C=1, epsilon=0.1), predict.args=list(type="probabilities"), 
	xtrans=as.matrix)

We have added two extra things. First we pass prob.model=TRUE to the ksvm model function, telling it to generate probabilities. We also added predict.args to AbstractModel, so when the predict function for ksvm is called, it will be passed type="probabilities", telling it to return a matrix of class probabilities.

We now have an AbstractModel-class instance ready to use with gensemble. Please see the documentation for AbstractModel-class for further examples and information.

Note

This is still relatively experimental code. In particular I expect AbstractModel to not be abstract enough at some point in the near future, and fail to be able to model normal usage. We welcome bug reports or any other feedback.

Author(s)

Peter Werner and Eugene Dubossarsky gensemble.r@gmail.com

References

http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm

See Also

mksampsize, AbstractModel-class, predict.gensemble

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
## Not run: 
#classification with kernlab
library(kernlab)
#make our abstract model object
abm <- ab.create(model.call="ksvm", model.args=list(prob.model=TRUE, 
	type="C-svc", C=1, epsilon=0.1), predict.args=list(type="probabilities"), 
	xtrans=as.matrix)
#the example data
X <- iris[,1:4]
Y <- iris[,5]
#create a training/test set
samp <- sample(1:nrow(iris), nrow(iris) * 0.8)        
#train the model
gmod <- gensemble(abm, X[samp,], Y[samp], sampsize=0.8, sampsize_prop=TRUE)
#test it out
gpreds <- predict(gmod, X[-samp,])
#compare
cbind(apply(gpreds, 1, which.max), Y[-samp])


#regression with rpart
library(rpart)
abm <- ab.create(model.call="rpart", model.args=list(control=rpart.control(minsplit=2)))
X <- trees[,1:2]
Y <- trees[,3]
#generate a training set
samp <- sample(1:nrow(trees), nrow(trees) * 0.8)
#build the model
gmod <- gensemble(abm, X[samp,], Y[samp])
#use it to predict with the test set
gpreds <- predict(gmod, X[-samp,])
#compare
cbind(gpreds, Y[-samp])

## End(Not run)

gensemble documentation built on May 2, 2019, 1:02 p.m.