AdaBoost is the most popular boosting algo. Boosting is the best off-the-self-classifier.
library(rpart) son_train<-read.csv("sonar_train.csv",header=F) son_test<-read.csv("sonar_test.csv",header=F) y_train<-son_train[,61] x_train<-son_train[,1:60] y_test<-son_test[,61] x_test<-son_test[,1:60]
Start by: - Setting vectors to track training and test errors for plotting. - Setting a vector f = zero for all xi. At the end, if the value for an observation in this vector is >1, it will be positive, else negative.
## Track the training and test errors for each of 500 revs. train_error<-rep(0,500) test_error<-rep(0,500) ## Track "f", the weighted vote of the decision trees applied to train and test data. f_train<-rep(0,130) f_test<-rep(0,78) i<-1 ## iteration number
For m observations [m] and 500 iterations [i]:
As it runs thru tree and seeks to minimize Gini index, it does so including the observation weight resulting in a weighted Gini index. (You can give weights to Rpart so that it knows how to weight.)
while(i<=400) { print(c("i =",i)) w<-exp(-y_train*f_train) ## Initialize vector of weights (w[m]) as exp of the y_train and 0'd f_train vectors. ## After compute weights as e(-y*f) where f=∑(i=0,M)alpha[m]*g[m] for prior iterations. w<-w/sum(w) ## Normalize weights (w[m]) at 1/N for all obs. Ensures they sum to 1. fit<-rpart(y_train~.,x_train,w,method="class") ## Fit classifier (decision tree w/ rpart) to the training data using the weights w[m]. g_train<--1+2*(predict(fit,x_train)[,2]>0.6) ## Calculate g[m] flagging successful (1) and unsuccessful (-1) predictions for each x[i]. Use later to +weight or -weight observations. g_test<--1+2*(predict(fit,x_test)[,2]>0.6) ## Do the same calculation for the test set e<-sum(w*(y_train*g_train<0)) ## Compute the weighted (w[m]) misclass errors (e[m]) i.e., sum of weights for each obs classed incorrectly (g_train<0). This is the weight we will use for the majority vote. So, after the terations, we'll weight trees that performed best overall. print(c("e =",e)) alpha<-0.4*log((1-e)/e) ## Compute alpha: 1/2 log odds of the sum of the misclass errors (e[m]). This becomes the weight of the tree. So, if sum of weighted misclass is large, the tree weight will be small and vice versa. print(c("alpha =",alpha)) f_train<-f_train+alpha*g_train ## Take the prior f_train[m] vector and add to it the tree weight (alpha) * the individual observation values (g[m]) vector. This combines tree and observation weights cumulating our weighted vote of each of our trees. f_test<-f_test+alpha*g_test ## Update the prediction on the test data set. train_error[i]<-sum(1*f_train*y_train<0)/130 ## Compute and save the training error (i.e., f_train * y_train where y_train<0) for this run (of 500). Basically answers the question, "Which obs would I misclassify if I stopped right now."" print(c("train err =",train_error[i])) test_error[i]<-sum(1*f_test*y_test<0)/78 ## Same as above, but applied to test data set. print(c("test error =",test_error[i])) i<-i+1 ## Now go back up. Replace previous weights w[m] with w[m]*e^-(alpha[m])(g[m](x[i]))*yi. This will help tree focus on observations we've been misclassifying so far. Key points: ## 1. If y[i] = g[m](x[i]), then test class = predicted class and the product g[m](x[i])*y[i] will = 1. If not (i.e., an incorrect classifiation) it will be -1, ## 2. Because we're taking e^-alpha[m] if it's correct, we're adding e^-alpha[m]. Otherwise, we're deducting e^-alpha[m], ## 3. Recall that if the prediction is correct, e^-alpha[m] is less than one and it will down-weight the weight. If incorrect, it's greater than one and you up-weight. ## Fit a new classifer using the new weights. }
plot(seq(1,500),test_error,type="l",ylim=c(0,0.5),ylab="Error Rate", xlab="Iterations",lwd=2,col="black") lines(train_error,lwd=2,col="purple") legend(4,0.5,c("Training Error","Test Error"),col=(c("purple","black",lwd=2)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.