In souravc83/fastAdaboost: a Fast Implementation of Adaboost

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)

fastAdaboost

fastAdaboost is a blazingly fast implementation of adaboost for R. It uses C++ code in the backend to provide an implementation of adaboost that is about 100 times faster than native R based libraries currently available. This is especially useful if your data size is large. fastAdaboost works only for binary classification tasks presently. It implements Freund and Schapire's Adaboost.M1 and Zhu et. al's SAMME.R (real adaboost) algorithms.

Install

It is not submitted to CRAN yet.

devtools::install_github("souravc83/fastAdaboost")

Quick Demo

library("fastAdaboost")
set.seed(9999)

num_each <- 1000
fakedata <- data.frame( X=c(rnorm(num_each,0,1),rnorm(num_each,1.5,1)), Y=c(rep(0,num_each),rep(1,num_each) ) )
fakedata$Y <- factor(fakedata$Y)
#run adaboost
test_adaboost <- adaboost(Y~X, fakedata, 10)
#print(A)
pred <- predict( test_adaboost, newdata=fakedata)
print(paste("Adaboost Error on fakedata:",pred$error))
print(table(pred$class,fakedata$Y))

test_real_adaboost <- real_adaboost(Y~X, fakedata, 10)
pred_real <- predict(test_real_adaboost,newdata=fakedata)
print(paste("Real Adaboost Error on fakedata:", pred_real$error))
print(table(pred_real$class,fakedata$Y))

Performance Benchmarking

How fast is fastAdaboost compared to native R implementations? I used microbenchmark package to compare the running times of fastAdaboost with Adabag, which is one of the most popular native R based libraries which implements the adaboost algorithm.

library(microbenchmark)
library(adabag)
library(MASS)

#using fastAdaboost
data(bacteria)
print(
  microbenchmark
  ( 
    boost_obj <- adaboost(y~.,bacteria , 10),
    pred <- predict(boost_obj,bacteria) 
  )
  )

#using adabag
print(
  microbenchmark
  ( 
    adabag_obj <-boosting(y~.,bacteria,boos=F,mfinal=10),
    pred_adabag <- predict(adabag_obj, bacteria)
  )
  )