rbst: Robust Boosting for Robust Loss Functions

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/rbst.R

Description

MM (majorization/minimization) algorithm based gradient boosting for optimizing nonconvex robust loss functions with componentwise linear, smoothing splines, tree models as base learners.

Usage

1
2
3
rbst(x, y, cost = 0.5, rfamily = c("tgaussian", "thuber","thinge", "tbinom", "binomd", 
"texpo", "tpoisson", "clossR", "closs", "gloss", "qloss"), ctrl=bst_control(), 
control.tree=list(maxdepth = 1), learner=c("ls","sm","tree"),del=1e-10)

Arguments

x

a data frame containing the variables in the model.

y

vector of responses. y must be in {1, -1} for classification.

cost

price to pay for false positive, 0 < cost < 1; price of false negative is 1-cost.

rfamily

robust loss function, see details.

ctrl

an object of class bst_control.

control.tree

control parameters of rpart.

learner

a character specifying the component-wise base learner to be used: ls linear models, sm smoothing splines, tree regression trees.

del

convergency criteria

Details

An MM algorithm operates by creating a convex surrogate function that majorizes the nonconvex objective function. When the surrogate function is minimized with gradient boosting algorithm, the desired objective function is decreased. The MM algorithm contains difference of convex (DC) algorithm for rfamily=c("tgaussian", "thuber","thinge", "tbinom", "binomd", "texpo", "tpoisson") and quadratic majorization boosting algorithm (QMBA) for rfamily=c("clossR", "closs", "gloss", "qloss").

rfamily = "tgaussian" for truncated square error loss, "thuber" for truncated Huber loss, "thinge" for truncated hinge loss, "tbinom" for truncated logistic loss, "binomd" for logistic difference loss, "texpo" for truncated exponential loss, "tpoisson" for truncated Poisson loss, "clossR" for C-loss in regression, "closs" for C-loss in classification, "gloss" for G-loss, "qloss" for Q-loss.

s must be a numeric value to be specified in bst_control. For rfamily="thinge", "tbinom", "texpo" s < 0. For rfamily="binomd", "tpoisson", "closs", "qloss", "clossR" , s > 0 and for rfamily="gloss", s > 1. Some suggested s values: "thinge"= -1, "tbinom"= -log(3), "binomd"= log(4), "texpo"= log(0.5), "closs"=1, "gloss"=1.5, "qloss"=2, "clossR"=1.

Value

An object of class bst with print, coef, plot and predict methods are available for linear models. For nonlinear models, methods print and predict are available.

x, y, cost, rfamily, learner, control.tree, maxdepth

These are input variables and parameters

ctrl

the input ctrl with possible updated fk if family="tgaussian", "thingeDC", "tbinomDC", "binomdDC" or "tpoisson".

yhat

predicted function estimates

ens

a list of length mstop. Each element is a fitted model to the pseudo residuals, defined as negative gradient of loss function at the current estimated function

ml.fit

the last element of ens

ensemble

a vector of length mstop. Each element is the variable selected in each boosting step when applicable

xselect

selected variables in mstop

coef

estimated coefficients in mstop

Author(s)

Zhu Wang

References

Zhu Wang (2018), Quadratic Majorization for Nonconvex Loss with Applications to the Boosting Algorithm, Journal of Computational and Graphical Statistics, 27(3), 491-502, https://doi.org/10.1080/10618600.2018.1424635

Zhu Wang (2018), Robust boosting with truncated loss functions, Electronic Journal of Statistics, 12(1), 599-650, https://doi.org/10.1214/18-EJS1404

See Also

cv.rbst for cross-validated stopping iteration. Furthermore see bst_control

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
x <- matrix(rnorm(100*5),ncol=5)
c <- 2*x[,1]
p <- exp(c)/(exp(c)+exp(-c))
y <- rbinom(100,1,p)
y[y != 1] <- -1
y[1:10] <- -y[1:10]
x <- as.data.frame(x)
dat.m <- bst(x, y, ctrl = bst_control(mstop=50), family = "hinge", learner = "ls")
predict(dat.m)
dat.m1 <- bst(x, y, ctrl = bst_control(twinboost=TRUE, 
coefir=coef(dat.m), xselect.init = dat.m$xselect, mstop=50))
dat.m2 <- rbst(x, y, ctrl = bst_control(mstop=50, s=0, trace=TRUE), 
rfamily = "thinge", learner = "ls")
predict(dat.m2)

Example output

           1            2            3            4            5            6 
 1.091123399 -0.847039352  0.187899882  1.752266972 -0.582761769  1.978726028 
           7            8            9           10           11           12 
-0.595264616 -3.027503096  0.989889031 -0.006325055  0.900457077 -0.447929029 
          13           14           15           16           17           18 
 1.049448014  1.659174101  1.221239706  0.822519797  0.374757414  1.368194971 
          19           20           21           22           23           24 
 1.189146398 -0.549676514 -0.968440602  0.160359381  0.197615130 -0.334331040 
          25           26           27           28           29           30 
 0.949071199  0.165841953 -0.455559220  0.859993040 -1.473549697  2.150522840 
          31           32           33           34           35           36 
-0.338653487 -0.744037227  0.660693943 -1.539656799 -0.477701997  1.145295263 
          37           38           39           40           41           42 
 0.922092693  0.138885910 -0.085432846  0.805321493  1.033310169 -0.651408616 
          43           44           45           46           47           48 
 0.064693853  0.244485214  0.685284159 -1.253025867  1.284610596 -0.675979840 
          49           50           51           52           53           54 
-1.113625781 -0.777443343 -0.299541475 -0.642205778  1.142341720 -0.898462882 
          55           56           57           58           59           60 
 0.802707199 -1.859144619 -0.481591053 -0.293532586 -1.634129691  1.299024686 
          61           62           63           64           65           66 
-1.252265331 -0.291558803  1.240585553  0.939225956 -1.396242192  0.817509119 
          67           68           69           70           71           72 
-0.105826660  0.484668094  0.390581510  0.115730882  0.530619961  1.704957702 
          73           74           75           76           77           78 
 1.881340339  0.329283280  0.679394182 -0.803966406  0.793290753  0.842193491 
          79           80           81           82           83           84 
-0.161535093 -0.108970669 -0.315165063 -0.059692244  0.644455406  1.886081110 
          85           86           87           88           89           90 
-0.774381317 -0.951753676 -0.382121333 -0.410488685  0.893446246  0.568420831 
          91           92           93           94           95           96 
 0.370046445  0.344962618  2.150144574 -0.051002723  0.298875913  1.405977174 
          97           98           99          100 
-0.167620846  0.331832655  0.509418975 -0.492005880 

generate initial values

robust boosting ...

initial loss 0.4905202 

m= 10   risk =  0.3828281
m= 20   risk =  0.2853815
m= 30   risk =  0.2378649
m= 40   risk =  0.2097398
m= 50   risk =  0.1924934
iteration 1 : los[k] <= ellu2 0
iteration 1 : ellu2 <= ellu1 -0.2980268
iteration 1 : relative change of fk 1714.351 , robust loss value 0.1924934 
d1= 1714.351 , k= 1 , d1 > del && k <= iter:  TRUE 

m= 10   risk =  0.3828281
m= 20   risk =  0.2853815
m= 30   risk =  0.2378649
m= 40   risk =  0.2097398
m= 50   risk =  0.1924934
iteration 2 : los[k] <= ellu2 0
iteration 2 : ellu2 <= ellu1 0
iteration 2 : relative change of fk 0 , robust loss value 0.1924934 
d1= 0 , k= 2 , d1 > del && k <= iter:  FALSE 
           1            2            3            4            5            6 
 1.302212568 -1.010907923  0.224251068  2.091261240 -0.695503094  2.361531156 
           7            8            9           10           11           12 
-0.710424746 -3.613205054  1.181393359 -0.007548702  1.074659863 -0.534585557 
          13           14           15           16           17           18 
 1.252474645  1.980158585  1.457501228  0.981644805  0.447258133  1.632886516 
          19           20           21           22           23           24 
 1.419199135 -0.656017152 -1.155795507  0.191382571  0.235845832 -0.399010857 
          25           26           27           28           29           30 
 1.132678892  0.197925804 -0.543691889  1.026367637 -1.758623210  2.566563848 
          31           32           33           34           35           36 
-0.404169526 -0.887978966  0.788512057 -1.837519418 -0.570118416  1.366864542 
          37           38           39           40           41           42 
 1.100481114  0.165754834 -0.101960718  0.961119311  1.233214767 -0.777430387 
          43           44           45           46           47           48 
 0.077209552  0.291783421  0.817859506 -1.495436752  1.533131875 -0.806755170 
          49           50           51           52           53           54 
-1.329068269 -0.927847843 -0.357490888 -0.766447164  1.363339605 -1.072279869 
          55           56           57           58           59           60 
 0.957999255 -2.218815479 -0.574759850 -0.350319517 -1.950269073  1.550334520 
          61           62           63           64           65           66 
-1.494529082 -0.347963886  1.480589730  1.120928985 -1.666359764  0.975664759 
          67           68           69           70           71           72 
-0.126299928  0.578432177  0.466143565  0.138120224  0.633273911  2.034799501 
          73           74           75           76           77           78 
 2.245305193  0.392986555  0.810830052 -0.959502068  0.946761099  1.005124580 
          79           80           81           82           83           84 
-0.192785736 -0.130052178 -0.376137022 -0.071240330  0.769132006  2.250963115 
          85           86           87           88           89           90 
-0.924193436 -1.135880322 -0.456046678 -0.489901990  1.066292714  0.678387752 
          91           92           93           94           95           96 
 0.441635778  0.411699224  2.566112403 -0.060869731  0.356696566  1.677978080 
          97           98           99          100 
-0.200048842  0.396029133  0.607971373 -0.587189533 

bst documentation built on Jan. 13, 2021, 7:18 p.m.