hardCompareQP: hardCompareQP

Description Usage Arguments Value Author(s) Examples

Description

Fit a sparse linear kernel hard margin SVM comparison model to linearly separable data. We first normalize the data using pairs2svmData, resulting in a scaled n x p feature difference matrix X, with a new vector of comparisons y in c(-1,1). We then define the linear kernel matrix K=XX' and solve the dual quadratic program (QP) of SVM: \min_{α\in R^n} α' K α/2 - y'α subject to for all i, y_i α_i ≥ 0. The learned function in the scaled binary SVM space is f(x) = b + ∑_{i\in sv} α_i k(d_i, x) where sv are the support vectors and the bias b is calculated using the average of b = y_i - f(d_i) over all support vectors i. The learned ranking function in the original space is r(x) = ∑_{i\in sv} -α_i/b k(d_i, Sx) where S is the diagonal scaling matrix of the input features. Since we use the linear kernel k, we can also write this function as r(x) = w'x with the weight vector w = -S/b ∑_{i\in sv} α_i d_i.

Usage

1
hardCompareQP(Pairs, add.to.diag = 1e-10, sv.threshold = 0.001)

Arguments

Pairs

see check.pairs.

add.to.diag

This value is added to the diagonal of the kernel matrix, to ensure that it is positive definite.

sv.threshold

Optimal coefficients α_i with absolute value greater than this value are considered support vectors.

Value

Comparison model fit. You can do fit$rank(X) to get m numeric ranks for the rows of the m x p numeric matrix X. For two feature vectors xi and xip, we predict no significant difference if their absolute rank difference is less than 1. You can do fit$predict(Xi,Xip) to get m predicted comparisons in c(-1,0,1), for m by p numeric matrices Xi and Xip. Also, fit$sigma are the scales of the input features, fit$sv are the support vectors (in the scaled space) and fit$weight is the optimal weight vector (in the original space), and if fit$margin is positive than the data are separable.

Author(s)

Toby Dylan Hocking

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
library(rankSVMcompare)
data(separable, envir=environment())
sol <- hardCompareQP(separable)
## check to make sure we have perfect prediction.
y.hat <- with(separable, sol$predict(Xi, Xip))
stopifnot(separable$yi == y.hat)
## This should also be the same:
fxdiff <- with(separable, sol$rank(Xip)-sol$rank(Xi))
y.hat2 <- ifelse(fxdiff < -1, -1L,
                 ifelse(fxdiff > 1, 1L, 0L))
stopifnot(y.hat == y.hat2)
## difference vectors and support vectors to plot.
point.df <- with(separable, data.frame(Xip-Xi, yi))
sv.df <- with(sol$sv, data.frame(t(t(X)*sol$sigma)))
## calc svm decision boundary and margin.
mu <- sol$margin
arange <- range(point.df$angle)
seg <- function(v, line){
  d <- (v-sol$weight[2]*arange)/sol$weight[1]
  data.frame(t(c(distance=d, angle=arange)), line)
}
seg.df <- rbind(seg(1-mu,"margin"),
                seg(1+mu,"margin"),
                seg(-1-mu,"margin"),
                seg(-1+mu,"margin"),
                seg(1,"decision"),
                seg(-1,"decision"))
library(ggplot2)
svplot <- ggplot()+
geom_point(aes(distance,angle,colour=factor(yi)), data=point.df,
           size=3)+
geom_point(aes(distance,angle), data=sv.df,size=1.5)+
geom_segment(aes(distance1,angle1,xend=distance2,yend=angle2,
                 linetype=line),data=seg.df)+
scale_linetype_manual(values=c(decision="solid",margin="dotted"))+
ggtitle(paste("Hard margin linear kernel comparison model",
              "support vectors in black",sep="\n"))
print(svplot)

tdhock/rankSVMcompare documentation built on May 31, 2019, 7:38 a.m.