Basic theory of RKSIR

Mercer kernel and reproducing kernel Hilbert space

Given predictors $X \in \mathcal{X} \subseteq \mathbb{R}^p$, a Mercer kernel is a continuous semi-difinite quadratic $k(\cdot,\cdot): \mathcal{X} \times \mathcal{X} \rightarrow \mathbb{R}$ has a singular value decomposition like this: $$k(x,z) = \sum_j \lambda_j \phi_j(x) \phi_j(z)$$ Where ${\phi_j}$ is the eigen function, ${\lambda_j}$ is the correspounding eigenvalue.

Each Mercer kernel $k(\cdot,\cdot)$ correspounds to a reproducing kernel Hilbert space: $$\mathcal{H} = {f | f(x) = \sum_{j \in \Lambda} a_j \phi_j(x) , \sum_{j \in \Lambda} \frac{a_j^2}{\lambda_j} < \infty }$$ Where $\Lambda$ is the indice set.

Given a Mercer kernel, we can get a embedding $\phi$ which maps $X$ to a feature Hilbert space: $$\phi(x) = (\sqrt{\lambda_1}\phi_1(x), \sqrt{\lambda_2}\phi_2(x), \dots , \sqrt{\lambda_{| \Lambda |}}\phi_{| \Lambda |}(x))$$ where inner product is $k(x,z) = \left \langle \phi(x) , \phi(z) \right \rangle$.

Sliced Inverse Regression (SIR) in the feature space

Consider Inverse Regression model in the feature space: $$y = f(\beta_1^T \phi(x), \beta_2^T \phi(x), \dots , \beta_d^T \phi(x), \epsilon)$$ define ${\beta_j}_{j = 1}^{d}$ as the efficient dimension reduction (e.d.r.) directions.

If data satisfies the linear design condition:

For any $f$ in $\mathcal{H}$, if there exist $b_0 \in \mathbb{R}$ and $b \in \mathbb{R}^d$ for which $$E(f^T \phi(x) | S(X)) = b_0 + b^T S(X)$$ where $S(X) = (\beta_1^T \phi(x), \dots , \beta_d^T \phi(x))^T$.

We can get the following theorem:

Under the linear design condition, inverse regression curve $E(\phi(x) | y) - E(\phi(x))$ is in the linear subspace $\beta_j \Sigma , j = 1, \dots , d$, where $\Sigma$ is the covariance operator of $\phi(x)$.

Thus we need to solve the following eigen system: $$\Sigma_{E(\phi(x)|y)} \nu = \lambda \Sigma \nu$$ The solution of this eigen system has the form $\nu = \sum_{i=1}^{n} \alpha_i \phi(x_i)$, however, the form of $\phi(x)$ is not explicit.

Let $K$ be the Gram matrix of data $X$ with kernel $k(\cdot,\cdot)$, $K_{i,j} = \left \langle \phi(x_i) , \phi(x_j) \right \rangle = k(x_i , x_j) , i,j = 1, \dots , n.$ We can prove that the eigen system of SIR in the feature space is equivalent to: $$\frac{1}{n} K J K \alpha = \lambda \frac{1}{n} K K \alpha $$.

Introduce a Tikhonov regularizer to the above eigen system, we can get the so-called Regularized Kernel SIR (RKSIR): $$\frac{1}{n} K J K \alpha = \lambda \frac{1}{n} (K^2 + n^2 s I) \alpha $$.

How to use SC19017

Import wine data from SC19017, and normalize it.

library(SC19017)
data(wine)
head(wine)
y <- wine[, 1]
X <- scale(wine[, -1], center = TRUE, scale = TRUE)
n <- nrow(X)

Use split function to split data into training set and testing set.

ns <- c(17, 20, 13)
train <- SC19017::split(y, ns)
test <- (1:n)[-train]
Xtrain <- X[train, ]
Xtest <- X[test, ]
ytrain <- y[train]
ytest <- y[test]

Select bandwidth as the median of the distances of all data.

distance <- matrix(NA, nrow = n, ncol = n-1)
for(i in 1:n) {
  distance[i, ] <- c(apply(X[-i, ], 1, function(o) {
    sqrt(sum((o - X[i, ])^2))
  }))
}
sigma <- median(distance)

Estimate e.d.r. directions: use KX function to get the Gram matrix of raw data; use Jy function to get the slice matrix of the response; use Ei function to get the RKSIR eigen analysis result; then use Vtr and Vte to compute the e.d.r. directions on training set and testing set.

Kx <- KX(Xtrain, "AG", sigma, center = TRUE)
J <- Jy(ytrain, numeric = FALSE)
ei2 <- Ei(Kx, J, s = 0.01)
vtrain <- Vtr(Xtrain, Xtrain, 2, ei2, "AG", sigma)
vtest <- Vte(Xtest, Xtrain, Xtrain, 2, ei2, "AG", sigma)

Further analysis: apply KNN method or SVM method on the e.d.r. directions to get a classification result.

# knn
library(class)
knn.pred <- knn(vtrain, vtest, ytrain, k = 5)
knn.result <- rep(NA, 3)
for(i in 1:3) {
  knn.result[i] <- sum(table(knn.pred, ytest)[-i, i])/sum(table(knn.pred, ytest)[, i])
}
mean(knn.result)
table(knn.pred, ytest)
#svm
library(e1071)
svm.model <- svm(vtrain, factor(ytrain))
svm.pred <- predict(svm.model, vtest)
svm.result <- rep(NA, 3)
for(i in 1:3) {
  svm.result[i] <- sum(table(svm.pred, ytest)[-i, i])/sum(table(svm.pred, ytest)[, i])
}
mean(svm.result)
table(svm.pred, ytest)

Scatter plot of the e.d.r. directions with their classes.

# plot figure
library(ggplot2)
Test <- data.frame(v1 = vtest[, 1], v2 = vtest[, 2], classes = as.factor(ytest))
ggplot(Test, aes(x = v1, y = v2, col = classes)) + 
  geom_point() + 
  xlab("first e.d.r. direction") + 
  ylab("second e.d.r. direction")


zhaolotelli/SC19017 documentation built on Jan. 3, 2020, 9:19 p.m.