Description Usage Arguments Details Value Warnings Author(s) References See Also Examples
‘maboost’ is used to fit a variety of stochastic boosting models for binary and multiclass responses as described in A Boosting Framework on Grounds of Online Learning by T. Naghibi and B. Pfister, (2014).
1 2 3 4 5 6 7 8 9 | maboost(x,...)
## Default S3 method:
maboost(x, y,test.x=NULL,test.y=NULL,breg=c("entrop","l2")
,type=c("normal","maxmargin","smooth","sparse"),C50tree=FALSE,iter=100, nu=1
,bag.frac=0.5,random.feature=TRUE,random.cost=TRUE,smoothfactor=1
,sparsefactor=FALSE,verbose=FALSE,...,na.action=na.rpart)
## S3 method for class 'formula'
maboost(formula, data, ..., subset, na.action=na.rpart)
|
x |
matrix of descriptors. |
y |
vector of responses (class labels). |
formula |
a symbolic description of the model to be fit. |
data |
dataframe containing variables and a column corresponding to class labels. |
test.x |
testing matrix of discriptors (optional) |
test.y |
vector of testing responses (optional) |
breg |
|
type |
determine the type of the algorithm to be used. Default is running the algorithm in the normal mode. |
C50tree |
flag to use C5.0 as the weak classifier. It is only recommended for multiclass setting where rpart maybe too weak to satisfy boostability condition. If it is used, don't forget to set the |
iter |
number of boosting iterations to perform. Default = 100. |
nu |
shrinkage parameter for boosting, default taken as 1. It is multiplied in eta and controls its largeness. Note that in the case of using sparseboost, nu can also be increased to enhance sparsity, at the expense of increasing the risk of divergence |
bag.frac |
sampling fraction for samples taken out-of-bag. This allows one to use random permutation which improves performance. |
random.feature |
flag to grow a random forest type trees. If TRUE, at each round a small set of features (num_feat^.5) are selected to grow a tree. It generally speeds up the convergence specially for large data sets and improves the performance. |
random.cost |
flag to assign random costs to selected features. By assigning random costs (look at cost in rpart.control) to the selected features (if random.forest=TRUE) it tries to decorrelates the trees and usually in combination with random.feature it improves the generalization error. |
smoothfactor |
an integer between 1 to N (number of examples in data) and have to be set if smooth is TRUE. If smoothfactor=K then examples weights are <=1/K and the final error is <K/N |
sparsefactor |
When it is true, an explicit l1 norm regularization term is used in the projection step of the algorithm (see [1]) to enhance the sparsity. Default is FALSE to guarantee the convergence of the Sparseboost algorithm. Note that, this parameter can also be set to a numeric value which is directly multiplied in the l1-norm regularization factor (see def. of alpha in [1]). |
verbose |
flag to output more details about internal parameters, error, num_zero, sum of weights, eta (classifeir coefficient) and max of weights, at each round of boosting. Default is FALSE |
subset |
an optional vector specifying a subset of observations to be used in the fitting process. |
na.action |
a function that indicates how to process ‘NA’ values. Default=na.rpart for rpart and na.pass for C5.0. |
... |
arguments passed to |
This function directly follows the algorithms listed in “Boosting on Grounds of Online Learning”.
When using usage ‘maboost(y~.)’: data must be in a data frame. Response can have factor or numeric values (preferably factor form). missing values can be present in the descriptor data, whenever na.action is set to any option other than na.pass.
After the model is fit, ‘maboost’ prints a summary of the function call, the method used for boosting, the number of iterations, the final confusion matrix (observed classification vs predicted classification; labels for classes are same as in response), the error for the training set, and testing, training , and kappa estimates of the appropriate number of iterations.
A summary of this information can also be obtained with the command ‘print(x)’.
Corresponding functions (Use help with summary.maboost, predict.maboost, ... varplot.maboost for additional information on these commands):
summary : function to print a summary of the original function call, method used for boosting, number of iterations, final confusion matrix, accuracy, and kappa statistic (a measure of agreement between the observed classification and predicted classification). ‘summary’ can be used for training, testing, or validation data.
predict : function to predict the response for any data set (train, test, or validation)
varplot.maboost : plot of variables ordered by the variable importance measure (based on improvement).
update : add more trees to the maboost
object.
model |
The following items are the different components created by the algorithms: trees: ensemble of rpart or C5.0 trees used to fit the model alpha: the weights of the trees used in the final aggregate model F : F[[1]] corresponds to the training sum, F[[2]]], ... corresponds to testing sums. errs : matrix of errs, training, kappa, testing 1, kappa 1, ... lw : last weights calculated, used by update routine num_zero: a vector of length iter containing the number of zeros in the weight vector at each round. |
fit |
The predicted classification for each observation in the original level of the response. |
call |
The function call. |
nu |
shrinkage parameter |
breg |
The type of maboost performed: ‘"l2"’, ‘"entrop"’. |
confusion |
The confusion matrix (True value vs. Predicted value) for the training data. |
iter |
The number of boosting iterations that were performed. |
actual |
The original response vector. |
(a) Choose type="normal" or "maxmargin" for multiclass classification. SmoothBoost do not work in multiclass setting and SparseBoost does not make sense to be used for multiclass classification (where we have to deal with a weight matrix rather than a weight vector).
(b) cost variable in rpart.control is the only variable in rpart.control that CANNOT be set through maboost. It is reserved for random.cost.
Tofigh Naghibi, ETH Zurich
Special thanks to Dr. Mark Culp and his colleagues who developed the 'ada' package. A big part of this package has been built upon their code. In particular, summary, print and varplot.maboost functions are imported from 'ada' package with almost no changes. For further info about 'ada' which implements different variations of Anyboost, look at [2]
[1] Naghibi, T., Pfister, B. (2014). A Boosting Framework on Grounds of Online Learning. NIPS.
[2] Culp, M., Johnson, K., Michailidis, G. (2006). maboost: an R Package for Stochastic Boosting Journal of Statistical Software, 16.
print.maboost
,summary.maboost
,predict.maboost
,update.maboost
,varplot.maboost
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ## fit maboost model
data(iris)
##drop setosa
iris[iris$Species!="setosa",]->iris
##set up testing and training data (60% for training)
n<-dim(iris)[1]
trind<-sample(1:n,floor(.6*n),FALSE)
teind<-setdiff(1:n,trind)
iris[,5]<- as.factor((levels(iris[,5])[2:3])[as.numeric(iris[,5])-1])
##fit a tree with maxdepth=6 (a variable pass to rpart.control).
gdis<-maboost(Species~.,data=iris[trind,],iter=50,nu=2
,breg="l2", type="sparse",bag.frac=1,random.feature=FALSE
,random.cost=FALSE, C50tree=FALSE, maxdepth=6,verbose=TRUE)
##to see the average zeros in the weighting vectors over the 40 rounds of boosting
print(mean(gdis$model$num_zero))
##prediction
pred.gdis= predict(gdis,iris,type="class");
##variable selection
varplot.maboost(gdis)
|
Loading required package: rpart
Loading required package: C50
[1] "err" "0.05" "eta" "0.03" "sum_w" "0.14"
[7] "num_zero" "57" "max_w*M" "2.8"
[1] "err" "0.05" "eta" "0.00466667" "sum_w"
[6] "0.252" "num_zero" "30" "max_w*M" "2.52"
[1] "err" "0.05" "eta" "0.00497778" "sum_w"
[6] "0.21715556" "num_zero" "46" "max_w*M" "2.2213"
[1] "err" "0.05" "eta" "0.00155556" "sum_w"
[6] "0.21404444" "num_zero" "46" "max_w*M" "2.3147"
[1] "err" "0.05" "eta" "0.00145185" "sum_w"
[6] "0.26195556" "num_zero" "11" "max_w*M" "2.2276"
[1] "err" "0.05" "eta" "0.00808198" "sum_w"
[6] "0.11993679" "num_zero" "50" "max_w*M" "1.7426"
[1] "err" "0.05" "eta" "0.00158966" "sum_w"
[6] "0.15808869" "num_zero" "30" "max_w*M" "1.6473"
[1] "err" "0.05" "eta" "0.00221671" "sum_w"
[6] "0.17380894" "num_zero" "28" "max_w*M" "1.7678"
[1] "err" "0.05" "eta" "0.00464676" "sum_w"
[6] "0.09281455" "num_zero" "53" "max_w*M" "1.489"
[1] "err" "0.03333333" "eta" "0.00109867" "sum_w"
[6] "0.11918268" "num_zero" "30" "max_w*M" "1.4231"
[1] "err" "0.03333333" "eta" "0.00148683" "sum_w"
[6] "0.13179451" "num_zero" "30" "max_w*M" "1.5123"
[1] "err" "0.03333333" "eta" "0.00269719" "sum_w"
[6] "0.11901732" "num_zero" "42" "max_w*M" "1.3505"
[1] "err" "0.03333333" "eta" "0.00220012" "sum_w"
[6] "0.14761894" "num_zero" "19" "max_w*M" "1.2579"
[1] "err" "0.01666667" "eta" "0.00313513" "sum_w"
[6] "0.07513717" "num_zero" "54" "max_w*M" "1.446"
[1] "err" "0.01666667" "eta" "0.0001222" "sum_w"
[6] "0.07806994" "num_zero" "30" "max_w*M" "1.4533"
[1] "err" "0.01666667" "eta" "0.00121347" "sum_w"
[6] "0.11737269" "num_zero" "18" "max_w*M" "1.3805"
[1] "err" "0.01666667" "eta" "0.00297698" "sum_w"
[6] "0.0608023" "num_zero" "54" "max_w*M" "1.2019"
[1] "err" "0.01666667" "eta" "0.00029622" "sum_w"
[6] "0.06791169" "num_zero" "30" "max_w*M" "1.2197"
[1] "err" "0" "eta" "0.00097592" "sum_w"
[6] "0.097033" "num_zero" "17" "max_w*M" "1.1611"
[1] "err" "0.01666667" "eta" "0.00218534" "sum_w"
[6] "0.05193266" "num_zero" "55" "max_w*M" "1.0753"
[1] "err" "0.01666667" "eta" "0.00045201" "sum_w"
[6] "0.06278081" "num_zero" "30" "max_w*M" "1.0571"
[1] "err" "0" "eta" "0.00095371" "sum_w"
[6] "0.0889784" "num_zero" "18" "max_w*M" "0.9999"
[1] "err" "0" "eta" "0.00223556" "sum_w"
[6] "0.07853953" "num_zero" "45" "max_w*M" "0.8658"
[1] "err" "0" "eta" "0.0014058" "sum_w"
[6] "0.06588736" "num_zero" "43" "max_w*M" "0.9501"
[1] "err" "0" "eta" "0.00060918" "sum_w"
[6] "0.06040476" "num_zero" "43" "max_w*M" "0.9866"
[1] "err" "0" "eta" "0.00055372" "sum_w"
[6] "0.07867762" "num_zero" "17" "max_w*M" "0.9534"
[1] "err" "0" "eta" "0.0018418" "sum_w"
[6] "0.04107249" "num_zero" "52" "max_w*M" "0.8429"
[1] "err" "0" "eta" "0.00011744" "sum_w"
[6] "0.04400853" "num_zero" "31" "max_w*M" "0.85"
[1] "err" "0" "eta" "0.00074683" "sum_w"
[6] "0.06728675" "num_zero" "20" "max_w*M" "0.8052"
[1] "err" "0" "eta" "0.00162349" "sum_w"
[6] "0.0751163" "num_zero" "35" "max_w*M" "0.7077"
[1] "err" "0" "eta" "0.00164281" "sum_w"
[6] "0.04411535" "num_zero" "44" "max_w*M" "0.8063"
[1] "err" "0" "eta" "0.00036964" "sum_w"
[6] "0.05705264" "num_zero" "13" "max_w*M" "0.7841"
[1] "err" "0" "eta" "0.00103049" "sum_w"
[6] "0.03251225" "num_zero" "46" "max_w*M" "0.846"
[1] "err" "0" "eta" "0.00053567" "sum_w"
[6] "0.05976456" "num_zero" "3" "max_w*M" "0.8138"
[1] "err" "0" "eta" "0.0014583" "sum_w"
[6] "0.04816562" "num_zero" "45" "max_w*M" "0.7263"
[1] "err" "0" "eta" "0.00059684" "sum_w"
[6] "0.04279408" "num_zero" "43" "max_w*M" "0.7621"
[1] "err" "0" "eta" "0.00045715" "sum_w"
[6] "0.05330854" "num_zero" "17" "max_w*M" "0.7347"
[1] "err" "0" "eta" "0.00102179" "sum_w"
[6] "0.03184613" "num_zero" "47" "max_w*M" "0.6734"
[1] "err" "0" "eta" "5.3e-05" "sum_w"
[6] "0.03290611" "num_zero" "26" "max_w*M" "0.6766"
[1] "err" "0" "eta" "0.00044985" "sum_w"
[6] "0.04735305" "num_zero" "18" "max_w*M" "0.6496"
[1] "err" "0" "eta" "0.00115122" "sum_w"
[6] "0.02312548" "num_zero" "53" "max_w*M" "0.5805"
[1] "err" "0" "eta" "8.619e-05" "sum_w"
[6] "0.02502973" "num_zero" "29" "max_w*M" "0.5857"
[1] "err" "0" "eta" "0.00047434" "sum_w"
[6] "0.03991721" "num_zero" "19" "max_w*M" "0.5572"
[1] "err" "0" "eta" "0.0009622" "sum_w"
[6] "0.04224655" "num_zero" "38" "max_w*M" "0.4995"
[1] "err" "0" "eta" "0.00077531" "sum_w"
[6] "0.02984154" "num_zero" "36" "max_w*M" "0.546"
[1] "err" "0" "eta" "0.00017943" "sum_w"
[6] "0.03289188" "num_zero" "11" "max_w*M" "0.5352"
[1] "err" "0" "eta" "0.00064768" "sum_w"
[6] "0.02757345" "num_zero" "44" "max_w*M" "0.4964"
[1] "err" "0" "eta" "0.00018629" "sum_w"
[6] "0.02589686" "num_zero" "43" "max_w*M" "0.5076"
[1] "err" "0" "eta" "0.0001683" "sum_w"
[6] "0.02976773" "num_zero" "17" "max_w*M" "0.4975"
[1] "err" "0" "eta" "0.00065597" "sum_w"
[6] "0.03252945" "num_zero" "39" "max_w*M" "0.4581"
[1] 33.9
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.