Clustered Support Vector Machine

Description

Implementation of Gu, Quanquan, and Jiawei Han. "Clustered support vector machines."

Usage

1
2
3
4
5
clusterSVM(x, y, centers = NULL, cluster.object = NULL, lambda = 1,
  sparse = TRUE, valid.x = NULL, valid.y = NULL, valid.metric = NULL,
  type = 1, cost = 1, epsilon = NULL, bias = TRUE, wi = NULL,
  verbose = 1, seed = NULL, cluster.method = "kmeans",
  cluster.fun = NULL, cluster.predict = NULL, ...)

Arguments

x

the nxp training data matrix. Could be a matrix or a sparse matrix object.

y

a response vector for prediction tasks with one value for each of the n rows of x. For classification, the values correspond to class labels and can be a 1xn matrix, a simple vector or a factor. For regression, the values correspond to the values to predict, and can be a 1xn matrix or a simple vector.

centers

an integer indicating the number of centers in clustering.

cluster.object

an object generated from cluster.fun, and can be passed to cluster.predict

lambda

the weight for the global l2-norm

sparse

indicating whether the transformation results in a sparse matrix or not

valid.x

the mxp validation data matrix.

valid.y

if provided, it will be used to calculate the validation score with valid.metric

valid.metric

the metric function for the validation result. By default it is the accuracy for classification or RMSE for regression. Customized metric is acceptable.

type

the type of the mission for LiblineaR.

cost

cost of constraints violation (default: 1). Rules the trade-off between regularization and correct classification on data. It can be seen as the inverse of a regularization constant. See details in LiblineaR.

epsilon

set tolerance of termination criterion for optimization. If NULL, the LIBLINEAR defaults are used, which are:

bias

if bias is TRUE (default), instances of data becomes [data; 1].

wi

a named vector of weights for the different classes, used for asymmetric class sizes. Not all factor levels have to be supplied (default weight: 1). All components have to be named according to the corresponding class label. Not used in regression mode.

verbose

if set to 0, no information is printed. If set to 1 (default), the running time and validation score (if applicable) will be printed. If set to 2, the running time ,validation score (if applicable) and the LiblineaR information will be printed.

seed

the random seed. Set it to NULL to randomize the model.

cluster.method

The clusterign algorithm to use. Possible choices are

  • "kmeans" Algorithm from stats::kmeans

  • "mlKmeans" Algorithm from RcppMLPACK::mlKmeans

  • "kernkmeans" Algorithm from kernlab::kkmeans

If cluster.fun and cluster.predict are provided, cluster.method doesn't work anymore.

cluster.fun

The function to train cluster labels for the data based on given number of centers. Customized function is acceptable, as long as the resulting list contains two fields named as cluster and centers.

cluster.predict

The function to predict cluster labels for the data based on trained object. Customized function is acceptable, as long as the resulting list contains two fields named as cluster and centers.

...

additional parameters passing to cluster.fun.

Value

  • svm the svm object from LiblineaR

  • lambda the parameter used.

  • sparse whether the data is sparsely transformed

  • label the clustering label for training data

  • centers the clustering centers from teh training dataset

  • cluster.fun the function used for clustering

  • cluster.object the object either

  • cluster.predict the function used for prediction on new data based on the object

  • valid.pred the validation prediction

  • valid.score the validation score

  • valid.metric the validation metric

  • time a list object recording the time consumption for each steps.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data(svmguide1)
svmguide1.t = svmguide1[[2]]
svmguide1 = svmguide1[[1]]

csvm.obj = clusterSVM(x = svmguide1[,-1], y = svmguide1[,1], lambda = 1,
                      centers = 8, seed = 512, verbose = 0,
                      valid.x = svmguide1.t[,-1],valid.y = svmguide1.t[,1])
csvm.pred = csvm.obj$valid.pred

# Or predict from the data
pred = predict(csvm.obj, svmguide1.t[,-1])