SwarmSVM

Introduction

This package contains several ensemble learning algorithms based on the following papers:

  1. Gu, Q., & Han, J. (2013). Clustered support vector machines. In proceedings of the sixteenth international conference on artificial intelligence and statistics (pp. 307-315).
  2. Hsieh, C. J., Si, S., & Dhillon, I. S. (2013). A divide-and-conquer solver for kernel support vector machines. arXiv preprint arXiv:1311.0914.
  3. Collobert, R., Bengio, S., & Bengio, Y. (2002). A parallel mixture of SVMs for very large scale problems. Neural computation, 14(5), 1105-1114.

The main idea of these algorithms are

  1. Reducing the scale of the data set results in a faster algorithm.
  2. If we divide a linear inseperable problem into smaller sub-problems appropriately, then it is possible to solve it by linear SVMs.

These two ideas focus on the efficiency and accuracy respectively. Specifically, we usually use more than one SVM model to solve the whole problem, therefore this is also an ensemble learning framework.

Data

In this package, we choose a small data set to demonstrate the usage of our functions. The data set is svmguide1 from libsvm's official website. The data is collected from an astroparticle application from Jan Conrad of Uppsala University, Sweden.

We can load it by

require(SwarmSVM)
data(svmguide1)

It is a list object. Let's first take a look at the training data.

head(svmguide1[[1]])

The first column contains the classification target value, the other columns contain the features. It is a binary classification task. The second part in the list is the test set:

head(svmguide1[[2]])

We rename them with the following command:

svmguide1.t = svmguide1[[2]]
svmguide1 = svmguide1[[1]]

From now on, we have the training data set svmguide1 and the test data set svmguide1.t.

Clustering Algorithm

In our pacakge, there are two main algorithms requiring clustering algorithm for the input data. Therefore we provide some clustering algorithms for users to choose from. Users can also implement their own functions and pass it to our algorithm.

Default Algorithms

We now provide two algorithms existing in R:

In clusterSVM and dcSVM, we offer an argument cluster.method, you could choose one of the two algorithms and pass its name to the argument.

Customization

We also offer arguments for users to pass their own implementation of the clustering algorithm: cluster.fun and cluster.predict.

cluster.fun is the clustering training function. It takes a function requiring the data and number of centers as the two main arguments. The output of this function should be an list object of the clustering result, while it has two fields:

  1. object$cluster as the clustering label on the input data.
  2. object$centers as the clustering center matrix.

cluster.predict is the predicting algorithm on the trained clustering object. It takes a function requiring the data and trained clustering object from cluster.fun as the two main arguments. The output of this function should be simply a vector of tue clustering label on the input data.

Clustered Support Vector Machines

Algorithm

The algorithm is straight forward:

Training

  1. Cluster the data. The default setting is stats::kmeans.
  2. Transform the data according to the Eq. (4) - Eq. (7) in the original paper.
  3. Solve the new problem with a linear svm from LiblineaR::LiblineaR.

Test

  1. Assign cluster label to each new data point, based on the clustering result from training.
  2. Transform the data according to the Eq. (4) - Eq. (7) in the original paper.
  3. Make prediction with the trained model.

Basic usage

We demonstrate the usage of this function with the following code:

csvm.obj = clusterSVM(x = svmguide1[,-1], y = svmguide1[,1], type = 1,
                      valid.x = svmguide1.t[,-1],valid.y = svmguide1.t[,1], 
                      seed = 1, verbose = 1, centers = 8)
csvm.obj$valid.score

Here the parameters are grouped into four parts:

  1. x and y are the feature matric and target vector of the training data. type is specifying the mission and the type of the SVM.
  2. valid.x and valid.y are the feature matric and target vector of the validation data.
  3. seed is controlling the random seed to make the result reproducible. verbose is controlling the content of the output.
  4. centers is the parameter passing to the cluster algorithm.

Dense and sparse input

The sample data set is in the format of sparse matrix.

class(svmguide1)

The function takes a dense matrix or a sparse matrix as the input feature matrix. Therefore the following code gives you the same result.

csvm.obj = clusterSVM(x = as.matrix(svmguide1[,-1]), y = svmguide1[,1], type = 1,
                      valid.x = as.matrix(svmguide1.t[,-1]),valid.y = svmguide1.t[,1], 
                      seed = 1, verbose = 1, centers = 8)
csvm.obj$valid.score

Self-defined clustering algorithm

In clusterSVM, the clustering is a very important step. Therefore we don't restrict users to the RcppMLPACK::mlKmeans algorithm. Instead, we accept user-defined clustering algorithm as an argument.

Note that we require the output of the clustering algorithm contains two fields: centers and cluster. One example could be

cluster.fun = stats::kmeans

cluster.predict = function(x, cluster.object) {
  centers = cluster.object$centers
  eucliDist = function(x, centers) apply(centers, 1, function(C) colSums( (t(x)-C)^2 ))
  euclidean.dist = eucliDist(x, centers)
  result = max.col(-euclidean.dist)
  return(result)
}

Here we use the default kmeans, and implement the prediction function. Once we have defined the algorithm, it is straight forward to pass it to clusterSVM:

csvm.obj = clusterSVM(x = svmguide1[,-1], y = svmguide1[,1], centers = 8, seed = 1,
                      cluster.fun = cluster.fun, cluster.predict = cluster.predict,
                      valid.x = svmguide1.t[,-1], valid.y = svmguide1.t[,1])
csvm.obj$valid.score

A Divide-and-Conquer Support Vector Machine

Algorithm

The algorithm could be described as the following:

Training

  1. We cluster the data in a recursive manner:
    • Cluster the data in k groups.
    • In each groups, we keep cluster the data into k finer groups.
    • Repeat for max.levels times
  2. At the finest level, we train svm models
  3. At the j-th group on level l, we
    • Concatenate the alpha (coefficient on support vector) value from the subgroups of the j-th group
    • Train the svm model with the alpha value initialized
  4. Fine tune the alpha values by training an svm on all the support vectors of the whole data set.
  5. Train the final svm model with the alpha value

Test

There are two ways to do prediction.

  1. Early Prediction
    • In early predicton on level l, we predict the clustering label at level l for the new input data.
    • For data points belong to each subgroup, we make prediction using the corresponding svm model for this subgroup.
  2. Exact Prediction
    • We predict with the final svm model on the new input data.

Basic usage

We demonstrate the usage of this function with the following code:

dcsvm.model = dcSVM(x = svmguide1[,-1], y = svmguide1[,1],
                    k = 4, max.levels = 4, seed = 0, cost = 32, gamma = 2,
                    kernel = 3,early = 0, m = 800,
                    valid.x = svmguide1.t[,-1], valid.y = svmguide1.t[,1])
dcsvm.model$valid.score

Here the parameters can be grouped into five parts.

  1. x and y are the feature matric and target vector of the training data.
  2. valid.x and valid.y are the feature matric and target vector of the validation data.
  3. seed is controlling the random seed to make the result reproducible.
  4. k, max.levels controls the size of the subproblem tree.
  5. early is the variable specifying whether we use early prediction or not. If early = 0 then we don't use early prediction strategy, if early = l then we perform the early prediction at level l.

Early Prediction

We can do the early prediction by the following command:

dcsvm.model = dcSVM(x = as.matrix(svmguide1[,-1]), y = svmguide1[,1], 
                    k = 10, max.levels = 1, 
                    early = 1, gamma = 2, cost = 32, tolerance = 1e-2, m = 800, 
                    valid.x = svmguide1.t[,-1], valid.y = svmguide1.t[,1])
dcsvm.model$valid.score
dcsvm.model$time$total.time

It is faster because we can stop at a level, and don't need to train SVMs for data of larger size.

Exact Prediction

To make the model more accurate, we can also perform the exact training by:

dcsvm.model = dcSVM(x = as.matrix(svmguide1[,-1]), y = svmguide1[,1], 
                    k = 10, max.levels = 1, 
                    early = 1, gamma = 2, cost = 32, tolerance = 1e-2, m = 800, 
                    valid.x = svmguide1.t[,-1], valid.y = svmguide1.t[,1])
dcsvm.model$valid.score
dcsvm.model$time$total.time

This is more accurate but the time is longer. It a balance between the accuracy and the time complexity.

The Mixture of SVM Experts with a Gater Function

Algorithm

The algorithm can be described in the following iterative framework:

Training

  1. Split the data into $M$ random subsets of nearly $N/M$.
  2. Train $M$ svm experts seperately on each subset.
  3. Each svm expert predict on the entire training data set.
  4. Train a neural network model minimizing the Eq. (7) in the original paper.
  5. Reconstruct subsets: Ffor each data point
    • sort the experts in descending order according to the weight from the neural network
    • assign the example to the first expert in the list which has less than (N/M+c) data points.
  6. Goto step 2 until a termination criterion is fulfilled.

Test

  1. Make prediction on the test data by the $M$ experts.
  2. Make prediction on the experts' output by the trained neural network model.

Basic usage

gaterSVM.model = gaterSVM(x = svmguide1[,-1], y = svmguide1[,1], hidden = 10, seed = 0,
                          m = 10, max.iter = 3, learningrate = 0.01, threshold = 1, stepmax = 1000,
                          valid.x = svmguide1.t[,-1], valid.y = svmguide1.t[,1], verbose = TRUE)
gaterSVM.model$valid.score

The parameters can be categorized into the following groups:

  1. x and y are the feature matric and target vector of the training data.
  2. valid.x and valid.y are the feature matric and target vector of the validation data.
  3. seed is controlling the random seed to make the result reproducible. verbose is controlling the content of the output.
  4. m, max.iter controls the iteration of "experts-gater" process.
  5. hidden, learningrate, threshold, stepmax are parameters for the neural network model.

Benchmarking

We offer benchmark codes to compare the performance and efficiency of our implementation. You can find the codes under inst/benchmark.

For some experiments running in a reasonable time, we have already generated some results. Those results are subjected to changes in the machine, system environment and the implementation.



Try the SwarmSVM package in your browser

Any scripts or data that you put into this service are public.

SwarmSVM documentation built on Dec. 28, 2022, 1:24 a.m.