Home

/

CRAN

/

maboost

/

maboost: Binary and Multiclass Boosting Algorithms

maboost: Binary and Multiclass Boosting Algorithms
In maboost: Binary and Multiclass Boosting Algorithms

Description Usage Arguments Details Value Warnings Author(s) References See Also Examples

‘maboost’ is used to fit a variety of stochastic boosting models for binary and multiclass responses as described in A Boosting Framework on Grounds of Online Learning by T. Naghibi and B. Pfister, (2014).

maboost(x,...)
## Default S3 method:
maboost(x, y,test.x=NULL,test.y=NULL,breg=c("entrop","l2")
,type=c("normal","maxmargin","smooth","sparse"),C50tree=FALSE,iter=100, nu=1
,bag.frac=0.5,random.feature=TRUE,random.cost=TRUE,smoothfactor=1
,sparsefactor=FALSE,verbose=FALSE,...,na.action=na.rpart)
           
## S3 method for class 'formula'
maboost(formula, data, ..., subset, na.action=na.rpart)

`x`	matrix of descriptors.
`y`	vector of responses (class labels).
`formula`	a symbolic description of the model to be fit.
`data`	dataframe containing variables and a column corresponding to class labels.
`test.x`	testing matrix of discriptors (optional)
`test.y`	vector of testing responses (optional)
`breg`	`breg="l2"` (default) selects quadratic Bregman divergence and `breg="entrop"` uses KL-divergence which results in a adaboost-like algorithm (with a different choice of eta).
`type`	determine the type of the algorithm to be used. Default is running the algorithm in the normal mode. `type="maxmargin"`: it guarantees that the margin of the final hypothesis converges to max-margin (at each round t, it divides eta by t^.5). `type="sparse"`: It uses SparseBoost and only works with `breg="l2"`. It generates sparse weight vectors by projecting the weight vectors onto R+. It can be used for multiclass but it is kind of meaningless since the multiclass setting uses a weight matrix instead of weight vector and increasing the sparsity of this matrix does not result in the sparsity of the weight vector (which is the sum over col. of the weight matrix). `type="smooth"`: flag to start smooth boosting. Only works for `breg="l2"` and for binary classification. Note that for `type="smooth"`, smoothfactor parameter should also be set, accordingly
`C50tree`	flag to use C5.0 as the weak classifier. It is only recommended for multiclass setting where rpart maybe too weak to satisfy boostability condition. If it is used, don't forget to set the `CF` and `minCases` parameters in C50Control properly
`iter`	number of boosting iterations to perform. Default = 100.
`nu`	shrinkage parameter for boosting, default taken as 1. It is multiplied in eta and controls its largeness. Note that in the case of using sparseboost, nu can also be increased to enhance sparsity, at the expense of increasing the risk of divergence
`bag.frac`	sampling fraction for samples taken out-of-bag. This allows one to use random permutation which improves performance.
`random.feature`	flag to grow a random forest type trees. If TRUE, at each round a small set of features (num_feat^.5) are selected to grow a tree. It generally speeds up the convergence specially for large data sets and improves the performance.
`random.cost`	flag to assign random costs to selected features. By assigning random costs (look at cost in rpart.control) to the selected features (if random.forest=TRUE) it tries to decorrelates the trees and usually in combination with random.feature it improves the generalization error.
`smoothfactor`	an integer between 1 to N (number of examples in data) and have to be set if smooth is TRUE. If smoothfactor=K then examples weights are <=1/K and the final error is <K/N
`sparsefactor`	When it is true, an explicit l1 norm regularization term is used in the projection step of the algorithm (see [1]) to enhance the sparsity. Default is FALSE to guarantee the convergence of the Sparseboost algorithm. Note that, this parameter can also be set to a numeric value which is directly multiplied in the l1-norm regularization factor (see def. of alpha in [1]).
`verbose`	flag to output more details about internal parameters, error, num_zero, sum of weights, eta (classifeir coefficient) and max of weights, at each round of boosting. Default is FALSE
`subset`	an optional vector specifying a subset of observations to be used in the fitting process.
`na.action`	a function that indicates how to process ‘NA’ values. Default=na.rpart for rpart and na.pass for C5.0.
`...`	arguments passed to `rpart.control` and `C50Control`. For stumps, use `maxdepth=1,cp=-1,minsplit=0,xval=0`. `maxdepth` controls the depth of trees, and `cp` controls the complexity of trees. For C5.0 use `CF,minCases` control the complexity and size of the tree. The smaller the `CF` is, the less complex the tree and the larger the `minCases`, the smaller the size of the C5.0 tree

This function directly follows the algorithms listed in “Boosting on Grounds of Online Learning”.

When using usage ‘maboost(y~.)’: data must be in a data frame. Response can have factor or numeric values (preferably factor form). missing values can be present in the descriptor data, whenever na.action is set to any option other than na.pass.

After the model is fit, ‘maboost’ prints a summary of the function call, the method used for boosting, the number of iterations, the final confusion matrix (observed classification vs predicted classification; labels for classes are same as in response), the error for the training set, and testing, training , and kappa estimates of the appropriate number of iterations.

A summary of this information can also be obtained with the command ‘print(x)’.

Corresponding functions (Use help with summary.maboost, predict.maboost, ... varplot.maboost for additional information on these commands):

summary : function to print a summary of the original function call, method used for boosting, number of iterations, final confusion matrix, accuracy, and kappa statistic (a measure of agreement between the observed classification and predicted classification). ‘summary’ can be used for training, testing, or validation data.

predict : function to predict the response for any data set (train, test, or validation)

varplot.maboost : plot of variables ordered by the variable importance measure (based on improvement).

update : add more trees to the maboost object.

`model`	The following items are the different components created by the algorithms: trees: ensemble of rpart or C5.0 trees used to fit the model alpha: the weights of the trees used in the final aggregate model F : F[[1]] corresponds to the training sum, F[[2]]], ... corresponds to testing sums. errs : matrix of errs, training, kappa, testing 1, kappa 1, ... lw : last weights calculated, used by update routine num_zero: a vector of length iter containing the number of zeros in the weight vector at each round.
`fit`	The predicted classification for each observation in the original level of the response.
`call`	The function call.
`nu`	shrinkage parameter
`breg`	The type of maboost performed: ‘"l2"’, ‘"entrop"’.
`confusion`	The confusion matrix (True value vs. Predicted value) for the training data.
`iter`	The number of boosting iterations that were performed.
`actual`	The original response vector.

(a) Choose type="normal" or "maxmargin" for multiclass classification. SmoothBoost do not work in multiclass setting and SparseBoost does not make sense to be used for multiclass classification (where we have to deal with a weight matrix rather than a weight vector).

(b) cost variable in rpart.control is the only variable in rpart.control that CANNOT be set through maboost. It is reserved for random.cost.

Tofigh Naghibi, ETH Zurich

Special thanks to Dr. Mark Culp and his colleagues who developed the 'ada' package. A big part of this package has been built upon their code. In particular, summary, print and varplot.maboost functions are imported from 'ada' package with almost no changes. For further info about 'ada' which implements different variations of Anyboost, look at [2]

[1] Naghibi, T., Pfister, B. (2014). A Boosting Framework on Grounds of Online Learning. NIPS.

[2] Culp, M., Johnson, K., Michailidis, G. (2006). maboost: an R Package for Stochastic Boosting Journal of Statistical Software, 16.

print.maboost,summary.maboost,predict.maboost ,update.maboost,varplot.maboost

## fit maboost model
data(iris)
##drop setosa
iris[iris$Species!="setosa",]->iris
##set up testing and training data (60% for training)
n<-dim(iris)[1]
trind<-sample(1:n,floor(.6*n),FALSE)
teind<-setdiff(1:n,trind)
iris[,5]<- as.factor((levels(iris[,5])[2:3])[as.numeric(iris[,5])-1])
##fit a tree with maxdepth=6 (a variable pass to rpart.control). 
gdis<-maboost(Species~.,data=iris[trind,],iter=50,nu=2
                   ,breg="l2", type="sparse",bag.frac=1,random.feature=FALSE
                   ,random.cost=FALSE, C50tree=FALSE, maxdepth=6,verbose=TRUE)
##to see the average zeros in the weighting vectors over the 40 rounds of boosting
print(mean(gdis$model$num_zero))
##prediction
pred.gdis= predict(gdis,iris,type="class");
##variable selection
varplot.maboost(gdis)

Loading required package: rpart
Loading required package: C50
 [1] "err"      "0.05"     "eta"      "0.03"     "sum_w"    "0.14"    
 [7] "num_zero" "57"       "max_w*M"  "2.8"     
 [1] "err"        "0.05"       "eta"        "0.00466667" "sum_w"     
 [6] "0.252"      "num_zero"   "30"         "max_w*M"    "2.52"      
 [1] "err"        "0.05"       "eta"        "0.00497778" "sum_w"     
 [6] "0.21715556" "num_zero"   "46"         "max_w*M"    "2.2213"    
 [1] "err"        "0.05"       "eta"        "0.00155556" "sum_w"     
 [6] "0.21404444" "num_zero"   "46"         "max_w*M"    "2.3147"    
 [1] "err"        "0.05"       "eta"        "0.00145185" "sum_w"     
 [6] "0.26195556" "num_zero"   "11"         "max_w*M"    "2.2276"    
 [1] "err"        "0.05"       "eta"        "0.00808198" "sum_w"     
 [6] "0.11993679" "num_zero"   "50"         "max_w*M"    "1.7426"    
 [1] "err"        "0.05"       "eta"        "0.00158966" "sum_w"     
 [6] "0.15808869" "num_zero"   "30"         "max_w*M"    "1.6473"    
 [1] "err"        "0.05"       "eta"        "0.00221671" "sum_w"     
 [6] "0.17380894" "num_zero"   "28"         "max_w*M"    "1.7678"    
 [1] "err"        "0.05"       "eta"        "0.00464676" "sum_w"     
 [6] "0.09281455" "num_zero"   "53"         "max_w*M"    "1.489"     
 [1] "err"        "0.03333333" "eta"        "0.00109867" "sum_w"     
 [6] "0.11918268" "num_zero"   "30"         "max_w*M"    "1.4231"    
 [1] "err"        "0.03333333" "eta"        "0.00148683" "sum_w"     
 [6] "0.13179451" "num_zero"   "30"         "max_w*M"    "1.5123"    
 [1] "err"        "0.03333333" "eta"        "0.00269719" "sum_w"     
 [6] "0.11901732" "num_zero"   "42"         "max_w*M"    "1.3505"    
 [1] "err"        "0.03333333" "eta"        "0.00220012" "sum_w"     
 [6] "0.14761894" "num_zero"   "19"         "max_w*M"    "1.2579"    
 [1] "err"        "0.01666667" "eta"        "0.00313513" "sum_w"     
 [6] "0.07513717" "num_zero"   "54"         "max_w*M"    "1.446"     
 [1] "err"        "0.01666667" "eta"        "0.0001222"  "sum_w"     
 [6] "0.07806994" "num_zero"   "30"         "max_w*M"    "1.4533"    
 [1] "err"        "0.01666667" "eta"        "0.00121347" "sum_w"     
 [6] "0.11737269" "num_zero"   "18"         "max_w*M"    "1.3805"    
 [1] "err"        "0.01666667" "eta"        "0.00297698" "sum_w"     
 [6] "0.0608023"  "num_zero"   "54"         "max_w*M"    "1.2019"    
 [1] "err"        "0.01666667" "eta"        "0.00029622" "sum_w"     
 [6] "0.06791169" "num_zero"   "30"         "max_w*M"    "1.2197"    
 [1] "err"        "0"          "eta"        "0.00097592" "sum_w"     
 [6] "0.097033"   "num_zero"   "17"         "max_w*M"    "1.1611"    
 [1] "err"        "0.01666667" "eta"        "0.00218534" "sum_w"     
 [6] "0.05193266" "num_zero"   "55"         "max_w*M"    "1.0753"    
 [1] "err"        "0.01666667" "eta"        "0.00045201" "sum_w"     
 [6] "0.06278081" "num_zero"   "30"         "max_w*M"    "1.0571"    
 [1] "err"        "0"          "eta"        "0.00095371" "sum_w"     
 [6] "0.0889784"  "num_zero"   "18"         "max_w*M"    "0.9999"    
 [1] "err"        "0"          "eta"        "0.00223556" "sum_w"     
 [6] "0.07853953" "num_zero"   "45"         "max_w*M"    "0.8658"    
 [1] "err"        "0"          "eta"        "0.0014058"  "sum_w"     
 [6] "0.06588736" "num_zero"   "43"         "max_w*M"    "0.9501"    
 [1] "err"        "0"          "eta"        "0.00060918" "sum_w"     
 [6] "0.06040476" "num_zero"   "43"         "max_w*M"    "0.9866"    
 [1] "err"        "0"          "eta"        "0.00055372" "sum_w"     
 [6] "0.07867762" "num_zero"   "17"         "max_w*M"    "0.9534"    
 [1] "err"        "0"          "eta"        "0.0018418"  "sum_w"     
 [6] "0.04107249" "num_zero"   "52"         "max_w*M"    "0.8429"    
 [1] "err"        "0"          "eta"        "0.00011744" "sum_w"     
 [6] "0.04400853" "num_zero"   "31"         "max_w*M"    "0.85"      
 [1] "err"        "0"          "eta"        "0.00074683" "sum_w"     
 [6] "0.06728675" "num_zero"   "20"         "max_w*M"    "0.8052"    
 [1] "err"        "0"          "eta"        "0.00162349" "sum_w"     
 [6] "0.0751163"  "num_zero"   "35"         "max_w*M"    "0.7077"    
 [1] "err"        "0"          "eta"        "0.00164281" "sum_w"     
 [6] "0.04411535" "num_zero"   "44"         "max_w*M"    "0.8063"    
 [1] "err"        "0"          "eta"        "0.00036964" "sum_w"     
 [6] "0.05705264" "num_zero"   "13"         "max_w*M"    "0.7841"    
 [1] "err"        "0"          "eta"        "0.00103049" "sum_w"     
 [6] "0.03251225" "num_zero"   "46"         "max_w*M"    "0.846"     
 [1] "err"        "0"          "eta"        "0.00053567" "sum_w"     
 [6] "0.05976456" "num_zero"   "3"          "max_w*M"    "0.8138"    
 [1] "err"        "0"          "eta"        "0.0014583"  "sum_w"     
 [6] "0.04816562" "num_zero"   "45"         "max_w*M"    "0.7263"    
 [1] "err"        "0"          "eta"        "0.00059684" "sum_w"     
 [6] "0.04279408" "num_zero"   "43"         "max_w*M"    "0.7621"    
 [1] "err"        "0"          "eta"        "0.00045715" "sum_w"     
 [6] "0.05330854" "num_zero"   "17"         "max_w*M"    "0.7347"    
 [1] "err"        "0"          "eta"        "0.00102179" "sum_w"     
 [6] "0.03184613" "num_zero"   "47"         "max_w*M"    "0.6734"    
 [1] "err"        "0"          "eta"        "5.3e-05"    "sum_w"     
 [6] "0.03290611" "num_zero"   "26"         "max_w*M"    "0.6766"    
 [1] "err"        "0"          "eta"        "0.00044985" "sum_w"     
 [6] "0.04735305" "num_zero"   "18"         "max_w*M"    "0.6496"    
 [1] "err"        "0"          "eta"        "0.00115122" "sum_w"     
 [6] "0.02312548" "num_zero"   "53"         "max_w*M"    "0.5805"    
 [1] "err"        "0"          "eta"        "8.619e-05"  "sum_w"     
 [6] "0.02502973" "num_zero"   "29"         "max_w*M"    "0.5857"    
 [1] "err"        "0"          "eta"        "0.00047434" "sum_w"     
 [6] "0.03991721" "num_zero"   "19"         "max_w*M"    "0.5572"    
 [1] "err"        "0"          "eta"        "0.0009622"  "sum_w"     
 [6] "0.04224655" "num_zero"   "38"         "max_w*M"    "0.4995"    
 [1] "err"        "0"          "eta"        "0.00077531" "sum_w"     
 [6] "0.02984154" "num_zero"   "36"         "max_w*M"    "0.546"     
 [1] "err"        "0"          "eta"        "0.00017943" "sum_w"     
 [6] "0.03289188" "num_zero"   "11"         "max_w*M"    "0.5352"    
 [1] "err"        "0"          "eta"        "0.00064768" "sum_w"     
 [6] "0.02757345" "num_zero"   "44"         "max_w*M"    "0.4964"    
 [1] "err"        "0"          "eta"        "0.00018629" "sum_w"     
 [6] "0.02589686" "num_zero"   "43"         "max_w*M"    "0.5076"    
 [1] "err"        "0"          "eta"        "0.0001683"  "sum_w"     
 [6] "0.02976773" "num_zero"   "17"         "max_w*M"    "0.4975"    
 [1] "err"        "0"          "eta"        "0.00065597" "sum_w"     
 [6] "0.03252945" "num_zero"   "39"         "max_w*M"    "0.4581"    
[1] 33.9

maboost documentation built on May 2, 2019, 9:34 a.m.

maboost index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

maboost
Binary and Multiclass Boosting Algorithms

maboost: Binary and Multiclass Boosting Algorithms
In maboost: Binary and Multiclass Boosting Algorithms

Description

Usage

Arguments

Details

Value

Warnings

Author(s)

References

See Also

Examples

Example output

Related to maboost in maboost...

R Package Documentation

Browse R Packages

We want your feedback!

maboost Binary and Multiclass Boosting Algorithms

maboost: Binary and Multiclass Boosting Algorithms In maboost: Binary and Multiclass Boosting Algorithms

Description

Usage

Arguments

Details

Value

Warnings

Author(s)

References

See Also

Examples

Example output

Related to maboost in maboost...

R Package Documentation

Browse R Packages

We want your feedback!

maboost
Binary and Multiclass Boosting Algorithms

maboost: Binary and Multiclass Boosting Algorithms
In maboost: Binary and Multiclass Boosting Algorithms