parallelSVM: Parallel-voting version of Support-Vector-Machine

Description Usage Arguments Value Note Author(s) See Also Examples

Description

By sampling your data, running the Support-Vector-Machine algorithm on these samples in parallel on your own machine and letting your models vote on a prediction, we return much faster predictions than the regular Support-Vector-Machine and possibly even more accurate predictions.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## S3 method for class 'formula'
## S3 method for class 'formula'
parallelSVM(formula, data= NULL, numberCores = detectCores(),
			samplingSize = 0.2, ..., 
			subset, na.action = na.omit, scale = TRUE)
## Default S3 method
## Default S3 method:
parallelSVM(x, y = NULL, numberCores = detectCores(), 
			samplingSize = 0.2, scale = TRUE, type = NULL, 
			kernel = "radial", degree = 3, 
			gamma = if (is.vector(x)) 1 else 1/ncol(x), 
			coef0 = 0, cost = 1, nu = 0.5, class.weights = NULL, 
			cachesize = 40, tolerance = 0.001, epsilon = 0.1, 
			shrinking = TRUE, cross = 0, probability = FALSE, 
			fitted = TRUE, seed = 1L, ..., subset, na.action = na.omit)

Arguments

formula

a symbolic description of the model to be fit

data

An optional data frame containing the variables in the model. By default the variables are taken from the environment which 'svm' is called from.

x

A data matrix, a vactor ar a sparse matrix.

y

A response vector with one label for each row/component of x. Can be either a factor (for calssification tasks) or a numeric vector (for regression).

numberCores

Number of cores of your machine you want to use. Is set equal to the number of samples you take.

samplingSize

Size of your data or of x you will take in each sample.

scale

A logical vector indicating the variables to be scaled. If scale is of length 1, the value is recycled as many times as needed. Per default, data are scaled internally (both x and y variables) to zero mean and unit variance. The center and scale values are returned and used for later predictions.

type

Support-Vector-Machine can be used as a classification machine, as a regression machine, or for novelty detection. Depending of whether y is a factor or not, the default setting for type is C-classification or eps-regression, respectively, but may be overwritten by setting an explicit value. Valid options are: - C-classification - nu-classification - one-classification (for novelty detection) - eps-regression - nu-regression

kernel

the kernel used in training and predicting. You might consider changing some of the following parameters, depending on the kernel type. - linear - polynomial - radial basis - sigmoid

degree

parameter needed for kernel of type polynomial (default: 3)

gamma

parameter needed for all kernels except linear (default: 1/(data dimension))

coef0

parameter needed for kernels of type polynomial and sigmoid (default: 0)

cost

cost of constraints violation (default: 1)—it is the ‘C’-constant of the regularization term in the Lagrange formulation.

nu

parameter needed for nu-classification, nu-regression, and one-classification

class.weights

a named vector of weights for the different classes, used for asymmetric class sizes. Not all factor levels have to be supplied (default weight: 1). All components have to be named.

cachesize

cache memory in MB (default 40)

tolerance

tolerance of termination criterion (default: 0.001)

epsilon

epsilon in the insensitive-loss function (default: 0.1)

shrinking

option whether to use the shrinking-heuristics (default: TRUE)

cross

if a integer value k>0 is specified, a k-fold cross validation on the training data is performed to assess the quality of the model: the accuracy rate for classification and the Mean Squared Error for regression

probability

logical indicating whether the model should allow for probability predictions.

fitted

logical indicating whether the fitted values should be computed and included in the model or not (default: TRUE)

seed

integer seed for libsvm (used for cross-validation and probability prediction models).

...

additional parameters for the low level fitting function svm.default

subset

An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)

na.action

A function to specify the action to be taken if NAs are found. The default action is na.omit, which leads to rejection of cases with missing values on any required variable. An alternative is na.fail, which causes an error if NA cases are found. (NOTE: If given, this argument must be named.)

Value

A list containing of numberCores Support Vector Machine models.

Note

Usage is just like svm, the only difference is the numberCores you want to use (equal to the number of models you build), and the sampleSize (the size of the sample you want to use to create each model)

Author(s)

Wannes Rosiers

See Also

This package can be regarded as a parallel extension of svm.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
## Not run: 
# Load the normal svm function
library(e1071)

# Example with formula
# load trainData and testData
data(magicData)

# Calculate the model
# Here we use it on bigger data
system.time(serialSvm   <- svm(V11 ~ ., trainData[,-1], 
						probability=TRUE, cost=10, gamma=0.1))
system.time(parallelSvm <- parallelSVM(V11 ~ ., data = trainData[,-1],
						numberCores = 8, samplingSize = 0.2, 
						probability = TRUE, gamma=0.1, cost = 10))
                                       
# Calculate predictions
system.time(serialPredictions <- predict(serialSvm, testData))
system.time(parallelPredicitions <- predict(parallelSvm, testData))

# Check for quality
table(serialPredictions,testData[,"V11"])
table(parallelPredicitions,testData[,"V11"])

# Example without formula
# load data
data(iris)
x <- subset(iris, select = -Species)
y <- iris$Species

# estimate model and predict input values
system.time(model       <- parallelSVM(x, y))
system.time(serialmodel <- svm(x, y))

fitted(model)
fitted(serialmodel)

# Calculate predictions
system.time(serialPredictions <- predict(serialmodel, x))
system.time(parallelPredicitions <- predict(model, x))

# Check for quality
table(serialPredictions,y)
table(parallelPredicitions,y)

## End(Not run)

parallelSVM documentation built on May 2, 2019, 9:32 a.m.