Description Usage Arguments Details Value Note References See Also Examples
This is a modification of the svm
function in package e1071
that can deal with observation weights.
1 2 3 4 5 6 7 8 9 10 11 12 13 | wsvm(x, ...)
## S3 method for class 'formula'
wsvm(formula, data = NULL, case.weights = rep(1,
nrow(data)), ..., subset, na.action = na.omit, scale = TRUE)
## Default S3 method:
wsvm(x, y = NULL, scale = TRUE, type = NULL,
kernel = "radial", degree = 3, gamma = if (is.vector(x)) 1 else
1/ncol(x), coef0 = 0, cost = 1, nu = 0.5, class.weights = NULL,
case.weights = rep(1, nrow(x)), cachesize = 40, tolerance = 0.001,
epsilon = 0.1, shrinking = TRUE, cross = 0, probability = FALSE,
fitted = TRUE, seed = 1L, ..., subset = NULL, na.action = na.omit)
|
x |
(Required if no |
formula |
A symbolic description of the model to be fit. |
data |
An optional data frame containing the variables in the model. By default
the variables are taken from the environment which |
case.weights |
A vector of observation weights (default: a vector of 1s). |
subset |
An index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.) |
na.action |
A function to specify the action to be taken if |
scale |
A logical vector indicating the variables to be scaled. If |
y |
(Only if no |
type |
|
kernel |
The kernel used in training and predicting. You might consider changing some of the
following parameters, depending on the kernel type.
|
degree |
Parameter needed for kernel of type |
gamma |
Parameter needed for all kernels except |
coef0 |
Parameter needed for kernels of type |
cost |
Cost of constraints violation (default: 1) — it is the ‘C’-constant of the regularization term in the Lagrange formulation. |
nu |
Parameter needed for |
class.weights |
A named vector of weights for the different classes, used for asymmetric class sizes. Not all factor levels have to be supplied (default weight: 1). All components have to be named. |
cachesize |
Cache memory in MB (default: 40). |
tolerance |
Tolerance of termination criterion (default: 0.001). |
epsilon |
epsilon in the insensitive-loss function (default: 0.1). |
shrinking |
Option whether to use the shrinking-heuristics (default: |
cross |
If an integer value k>0 is specified, a k-fold cross validation on the training data is performed to assess the quality of the model: the accuracy rate for classification and the Mean Squared Error for regression. |
probability |
Logical indicating whether the model should allow for probability predictions
(default: |
fitted |
Logical indicating whether the fitted values should be computed and included in
the model or not (default: |
seed |
Integer seed for libsvm (used for cross-validation and probability prediction models). |
... |
Additional parameters for the low level fitting function |
wsvm
is used to train a support vector machine with case weights.
It can be used to carry
out general regression and classification (of nu and epsilon-type), as
well as density-estimation. A formula interface is provided.
This function is a modification of the svm
function in package
e1071 written by David Meyer
(based on C/C++-code by Chih-Chung Chang and Chih-Jen Lin).
An extension of LIBSVM that can deal with case weights written by
Ming-Wei Chang, Hsuan-Tien Lin, Ming-Hen Tsai, Chia-Hua Ho and Hsiang-Fu Yu
is used. It is available at
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances.
For multiclass-classification with k levels, k>2, libsvm
uses the
‘one-against-one’-approach, in which k(k-1)/2 binary classifiers are
trained; the appropriate class is found by a voting scheme.
libsvm
internally uses a sparse data representation, which is
also high-level supported by the package SparseM.
If the predictor variables include factors, the formula interface must be used to get a correct model matrix.
plot.svm
allows a simple graphical
visualization of classification models.
The probability model for classification fits a logistic distribution using maximum likelihood to the decision values of all binary classifiers, and computes the a-posteriori class probabilities for the multi-class problem using quadratic optimization. The probabilistic regression model assumes (zero-mean) laplace-distributed errors for the predictions, and estimates the scale parameter using maximum likelihood.
Data are scaled internally, usually yielding better results. Parameters of SVM-models usually must be tuned to yield sensible results!
An object of class "wsvm"
, inheriting from "svm"
containing the
fitted model, including:
SV |
The resulting support vectors (possibly scaled). |
index |
The index of the resulting support vectors in the data
matrix. Note that this index refers to the preprocessed data (after
the possible effect of |
coefs |
The corresponding coefficients times the training labels. |
rho |
The negative intercept. |
obj |
The value(s) of the objective function. |
sigma |
In case of a probabilistic regression model, the scale parameter of the hypothesized (zero-mean) laplace distribution estimated by maximum likelihood. |
probA, probB |
numeric vectors of length k(k-1)/2, k number of classes, containing the parameters of the logistic distributions fitted to the decision values of the binary classifiers (1 / (1 + exp(a x + b))). |
This modification is not well-tested.
Chang, Chih-Chung and Lin, Chih-Jen:
LIBSVM: a library for Support Vector Machines
http://www.csie.ntu.edu.tw/~cjlin/libsvm
Exact formulations of models, algorithms, etc. can be found in the
document:
Chang, Chih-Chung and Lin, Chih-Jen:
LIBSVM: a library for Support Vector Machines
http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.ps.gz
More implementation details and speed benchmarks can be found on:
Rong-En Fan and Pai-Hsune Chen and Chih-Jen Lin:
Working Set Selection Using the Second Order Information for Training SVM
http://www.csie.ntu.edu.tw/~cjlin/papers/quadworkset.pdf
predict.wsvm
, plot.svm
in package e1071,
matrix.csr
(in package SparseM).
Other svm: predict.wsvm
Other svm: predict.wsvm
Other svm: predict.wsvm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | data(iris)
attach(iris)
## classification mode
## default with factor response:
model <- wsvm(Species ~ ., data = iris)
# alternatively the traditional interface:
x <- subset(iris, select = -Species)
y <- Species
model <- wsvm(x, y)
print(model)
summary(model)
# test with train data
pred <- predict(model, x)
# (same as:)
pred <- fitted(model)
# Check accuracy:
table(pred, y)
# compute decision values and probabilities:
pred <- predict(model, x, decision.values = TRUE)
attr(pred, "decision.values")[1:4,]
## visualize (classes by color, SV by crosses):
plot(cmdscale(dist(iris[,-5])),
col = as.integer(iris[,5]),
pch = c("o","+")[1:150 %in% model$index + 1])
## density-estimation
# create 2-dim. normal with rho=0:
X <- data.frame(a = rnorm(1000), b = rnorm(1000))
attach(X)
# traditional way:
m <- wsvm(X, gamma = 0.1)
# formula interface:
m <- wsvm(~., data = X, gamma = 0.1)
# test:
newdata <- data.frame(a = c(0, 4), b = c(0, 4))
predict (m, newdata)
## visualize:
plot(X, col = 1:1000 %in% m$index + 1, xlim = c(-5,5), ylim=c(-5,5))
points(newdata, pch = "+", col = 2, cex = 5)
## weights: (example not particularly sensible)
i2 <- iris
levels(i2$Species)[3] <- "versicolor"
summary(i2$Species)
wts <- 100 / table(i2$Species)
wts
m <- wsvm(Species ~ ., data = i2, class.weights = wts)
## case.weights:
fit <- wsvm(Species ~ ., data = iris, wf = "gaussian", bw = 0.5, case.weights = rep(c(0.5,1),75))
pred <- predict(fit)
mean(pred != iris$Species)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.