README.md
In fastAdaboost: a Fast Implementation of Adaboost

fastAdaboost

fastAdaboost is a blazingly fast implementation of adaboost for R. It uses C++ code in the backend to provide an implementation of adaboost that is about 100 times faster than native R based libraries currently available. This is especially useful if your data size is large. fastAdaboost works only for binary classification tasks presently. It implements Freund and Schapire's Adaboost.M1 and Zhu et. al's SAMME.R (real adaboost) algorithms.

It is not submitted to CRAN yet.

devtools::install_github("souravc83/fastAdaboost")

library("fastAdaboost")
set.seed(9999)

num_each <- 1000
fakedata <- data.frame( X=c(rnorm(num_each,0,1),rnorm(num_each,1.5,1)), Y=c(rep(0,num_each),rep(1,num_each) ) )
fakedata$Y <- factor(fakedata$Y)
#run adaboost
test_adaboost <- adaboost(Y~X, fakedata, 10)
#print(A)
pred <- predict( test_adaboost, newdata=fakedata)
print(paste("Adaboost Error on fakedata:",pred$error))
#> [1] "Adaboost Error on fakedata: 0.1225"
print(table(pred$class,fakedata$Y))
#>    
#>       0   1
#>   0 848  93
#>   1 152 907

test_real_adaboost <- real_adaboost(Y~X, fakedata, 10)
pred_real <- predict(test_real_adaboost,newdata=fakedata)
print(paste("Real Adaboost Error on fakedata:", pred_real$error))
#> [1] "Real Adaboost Error on fakedata: 0.1105"
print(table(pred_real$class,fakedata$Y))
#>    
#>       0   1
#>   0 906 127
#>   1  94 873

How fast is fastAdaboost compared to native R implementations? I used the microbenchmark package to compare the running times of fastAdaboost with Adabag, which is one of the most popular native R based libraries which implements the Adaboost algorithm. The benchmarking indicates that fastAdaboost is about ~45-50 times faster than R based implementation. This is a huge benefit when data sizes are large.

library(microbenchmark)
library(adabag)
library(MASS)

#using fastAdaboost
data(bacteria)
print(
  microbenchmark
  ( 
    boost_obj <- adaboost(y~.,bacteria , 10),
    pred <- predict(boost_obj,bacteria) 
  )
  )
#> Unit: milliseconds
#>                                        expr      min       lq    mean
#>  boost_obj <- adaboost(y ~ ., bacteria, 10) 58.01665 58.69384 60.6658
#>        pred <- predict(boost_obj, bacteria) 26.91593 27.41415 29.5689
#>    median       uq      max neval cld
#>  59.20298 60.13180 74.54155   100   b
#>  27.91902 32.50484 37.58375   100  a

#using adabag
print(
  microbenchmark
  ( 
    adabag_obj <-boosting(y~.,bacteria,boos=F,mfinal=10),
    pred_adabag <- predict(adabag_obj, bacteria)
  )
  )
#> Unit: milliseconds
#>                                                            expr        min
#>  adabag_obj <- boosting(y ~ ., bacteria, boos = F, mfinal = 10) 2497.55208
#>                    pred_adabag <- predict(adabag_obj, bacteria)   34.50564
#>          lq       mean     median         uq       max neval cld
#>  2659.99737 2848.80065 2809.39769 2988.49017 3629.1527   100   b
#>    35.72336   45.21379   37.16913   42.22947  242.7932   100  a