Description Usage Arguments Details Value Wrapping the model function Note Author(s) References See Also Examples
Gensemble is a generalisation of random forests allowing allowing arbitrary use of underlying models.
1 2 3 
abm 
An object of type 
X 
A data frame or matrix of predictors 
Y 
A response vector. If Y is a factor classification is assumed, otherwise regression. See the notes for more details. 
sampsize 
A list or vector of sample sizes used when creating a bagged sample. If not supplied, all input data will be used to build the models. See mksampsize for details on how this will be interpreted. 
sampsize_prop 
A boolean indictating the values in samplesize should be interpreted as proportions. 
nmods 
How many models to build. 
perturb_val 
The proportion of input data to perturb. 
Xtest 
Optional test set of 
Ytest 
Optional test set of 
do.trace 
If

stepsize 
If 
This is a general implmentation of bagging. It enables (in theory) any
underling modelling/learning algorithm to be used, via the AbstractModelclass
.
An object of class gensembleclass
uncode gensembleclass.
The first argument to gensemble
is an instance of an AbstractModelclass
.
You will need to wrap the model you wish to use in this class before using gensemble.
First off, you should probably make sure the model function works for the data you will pass to gensemble. For example let's say we are using ksvm from kernlab, on the iris data set. You might have something that looks like this:
1 2 3 4 5 6 7 
We can wrap this up in an instance of AbstractModelclass
as
follows:
1 2 
We now pass the arguments we would pass to ksvm
via the model.args
argument to ab.create
. It is simply list of the arguments and their values.
Note we define the X
transform to be as.matrix
, which means
the X
values passed to ksvm
by AbstractModel
will
first be run through as.matrix
.
We can check this is working as expected using ab.model
and ab.predict
.
1 2  mod < ab.model(abm, X[samp,], Y[samp])
preds < ab.predict(abm, mod, X[samp,])

Classification with gensemble requires a probability matrix to be returned by the underlying model. We will need to pass some extra arguments to ksvm to make sure this is present.
1 2 3 
We have added two extra things. First we pass prob.model=TRUE
to the
ksvm
model function, telling it to generate probabilities. We also
added predict.args
to AbstractModel
, so when the predict
function for ksvm
is called, it will be passed type="probabilities"
,
telling it to return a matrix of class probabilities.
We now have an AbstractModelclass
instance ready to use with gensemble
.
Please see the documentation for AbstractModelclass
for
further examples and information.
This is still relatively experimental code. In particular I expect AbstractModel to not be abstract enough at some point in the near future, and fail to be able to model normal usage. We welcome bug reports or any other feedback.
Peter Werner and Eugene Dubossarsky gensemble.r@gmail.com
http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm
mksampsize
, AbstractModelclass
, predict.gensemble
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35  ## Not run:
#classification with kernlab
library(kernlab)
#make our abstract model object
abm < ab.create(model.call="ksvm", model.args=list(prob.model=TRUE,
type="Csvc", C=1, epsilon=0.1), predict.args=list(type="probabilities"),
xtrans=as.matrix)
#the example data
X < iris[,1:4]
Y < iris[,5]
#create a training/test set
samp < sample(1:nrow(iris), nrow(iris) * 0.8)
#train the model
gmod < gensemble(abm, X[samp,], Y[samp], sampsize=0.8, sampsize_prop=TRUE)
#test it out
gpreds < predict(gmod, X[samp,])
#compare
cbind(apply(gpreds, 1, which.max), Y[samp])
#regression with rpart
library(rpart)
abm < ab.create(model.call="rpart", model.args=list(control=rpart.control(minsplit=2)))
X < trees[,1:2]
Y < trees[,3]
#generate a training set
samp < sample(1:nrow(trees), nrow(trees) * 0.8)
#build the model
gmod < gensemble(abm, X[samp,], Y[samp])
#use it to predict with the test set
gpreds < predict(gmod, X[samp,])
#compare
cbind(gpreds, Y[samp])
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.