bkpc: Bayesian Kernel Projection Classifier
In BKPC: Bayesian Kernel Projection Classifier

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Function bkpc is used to train a Bayesian kernel projection classifier. This is a nonlinear multicategory classifier which performs the classification of the projections of the data to the principal axes of the feature space. The Gibbs sampler is implemented to find the posterior distributions of the parameters, so probability distributions of prediction can be obtained for for new observations.

## Default S3 method:
bkpc(x, y, theta = NULL, n.kpc = NULL, thin = 100, n.iter = 1e+05, std = 10, 
g1 = 0.001, g2 = 0.001, g3 = 1, g4 = 1, initSigmasq = NULL, initBeta = NULL,
initTau = NULL, intercept = TRUE, rotate = TRUE, ...)


## S3 method for class 'kern'
bkpc(x, y, n.kpc = NULL, thin = 100, n.iter = 1e+05, std = 10, 
g1 = 0.001, g2 = 0.001, g3 = 1, g4 = 1, initSigmasq = NULL, initBeta = NULL, 
initTau = NULL, intercept = TRUE, rotate = TRUE, ...)


## S3 method for class 'kernelMatrix'
bkpc(x, y, n.kpc = NULL, thin = 100, n.iter = 1e+05, std = 10, 
g1 = 0.001, g2 = 0.001, g3 = 1, g4 = 1, initSigmasq = NULL, initBeta = NULL, 
initTau = NULL, intercept = TRUE, rotate = TRUE, ...)

`x`	either: a data matrix, a kernel matrix of class `"kernelMatrix"` or a kernel matrix of class `"kern"`.
`y`	a response vector with one label for each row of `x`. Should be a factor.
`theta`	the inverse kernel bandwidth parameter.
`n.kpc`	number of kernel principal components to use.
`n.iter`	number of iterations for the MCMC algorithm.
`thin`	thinning interval.
`std`	standard deviation parameter for the random walk proposal.
`g1`	γ_1 hyper-parameter of the prior inverse gamma distribution for the σ^2 parameter in the BKPC model.
`g2`	γ_2 hyper-parameter of the prior inverse gamma distribution for the σ^2 parameter of the BKPC model.
`g3`	γ_3 hyper-parameter of the prior gamma distribution for the τ parameter in the BKPC model.
`g4`	γ_4 hyper-parameter of the prior gamma distribution for the τ parameter in the BKPC model.
`initSigmasq`	optional specification of initial value for the σ^2 parameter in the BKPC model.
`initBeta`	optional specification of initial values for the β parameters in the BKPC model.
`initTau`	optional specification of initial values for the τ parameters in the BKPC model.
`intercept`	if `intercept=TRUE` (the default) then include the intercept in the model.
`rotate`	if `rotate=TRUE` (the default) then run the BKPC model. Else run the BKMC model.
`...`	Currently not used.

Initial values for a BKPC model can be supplied, otherwise they are generated using runif function.

The data can be passed to the bkpc function in a matrix and the Gaussian kernel computed using the gaussKern function is then used in training the algorithm and predicting. The bandwidth parameter theta can be supplied to the gaussKern function, else a default value is used.

In addition, bkpc also supports input in the form of a kernel matrix of class "kern" or "kernelMatrix".The latter allows for a range of kernel functions as well as user specified ones.

If rotate=TRUE (the default) then the BKPC is trained. This algorithm performs the classification of the projections of the data to the principal axes of the feature space. Else the Bayesian kernel multicategory classifier (BKMC) is trained, where the data is mapped to the feature space via the kernel matrix, but not projected (rotated) to the principal axes. The hierarchichal prior structure for the two models is the same, but BKMC model is not sparse.

An object of class "bkpc" including:

`beta`	realizations of the β parameters from the joint posterior distribution in the BKPC model.
`tau`	realizations of the τ parameters from the joint posterior distribution in the BKPC model.
`z`	realizations of the latent variables z from the joint posterior distribution in the BKPC model.
`sigmasq`	realizations of the σ^2 parameter from the joint posterior distribution in the BKPC model.
`n.class`	number of independent classes of the response variable i.e. number of classes - 1.
`n.kpc`	number of kernel principal components used.
`n.iter`	number of iterations of the MCMC algorithm.
`thin`	thinning interval.
`intercept`	if true, intercept was included in the model.
`rotate`	if true, the sparse BKPC model was fitted, else BKMC model.
`kPCA`	if `rotate=TRUE` an object of class `"kPCA"`, else `NULL`.
`x`	the supplied data matrix or kernel matrix.
`theta`	if data was supplied, as opposed to the kernel, this is the inverse kernel bandwidth parameter used in obtaining the Gaussian kernel, else `NULL`.

If supplied, data are not scaled internally. If rotate=TRUE the mapping is centered internally by the kPCA function.

K. Domijan

Domijan K. and Wilson S. P.: Bayesian kernel projections for classification of high dimensional data. Statistics and Computing, 2011, Volume 21, Issue 2, pp 203-216

kPCA gaussKern predict.bkpc plot.bkpc summary.bkpc kernelMatrix (in package kernlab)

set.seed(-88106935)

data(microarray)

# consider only four tumour classes (NOTE: "NORM" is not a class of tumour)
y <- microarray[, 2309]
train <- as.matrix(microarray[y != "NORM", -2309])
wtr <- factor(microarray[y != "NORM", 2309], levels = c("BL" ,  "EWS" , "NB" ,"RMS" ))

n.kpc <- 6
n.class <- length(levels(wtr)) - 1

K <- gaussKern(train)$K

# supply starting values for the parameters
# use Gaussian kernel as input

result <- bkpc(K, y = wtr, n.iter = 1000,  thin = 10, n.kpc = n.kpc,  
initSigmasq = 0.001, initBeta = matrix(10, n.kpc *n.class, 1), 
initTau =matrix(10, n.kpc * n.class, 1), intercept = FALSE, rotate = TRUE)

# predict

out <- predict(result, n.burnin = 10) 

table(out$class, as.numeric(wtr))

# plot the data projection on the kernel principal components

pairs(result$kPCA$KPCs[, 1 : n.kpc], col = as.numeric(wtr), 
main =  paste("symbol = predicted class", "\n", "color = true class" ), 
pch = out$class, upper.panel = NULL)
par(xpd=TRUE)
legend('topright', levels(wtr), pch = unique(out$class), 
text.col = as.numeric(unique(wtr)), bty = "n")




# Another example: Iris data

data(iris)
testset <- sample(1:150,50)

train <- as.matrix(iris[-testset,-5])
test <- as.matrix(iris[testset,-5])

wtr <- iris[-testset, 5]
wte <- iris[testset, 5]

# use default starting values for paramteres in the model.

result <- bkpc(train, y = wtr,  n.iter = 1000,  thin = 10, n.kpc = 2, 
intercept = FALSE, rotate = TRUE)

# predict
out <- predict(result, test, n.burnin = 10) 

# classification rate
sum(out$class == as.numeric(wte))/dim(test)[1]

table(out$class, as.numeric(wte))

## Not run: 
# Another example: synthetic data from MASS library

library(MASS)

train<- as.matrix(synth.tr[, -3])
test<- as.matrix(synth.te[, -3])

wtr <- as.factor(synth.tr[, 3])
wte <- as.factor(synth.te[, 3])


#  make training set kernel using kernelMatrix from kernlab library

library(kernlab)

kfunc <- laplacedot(sigma = 1)
Ktrain <- kernelMatrix(kfunc, train)

#  make testing set kernel using kernelMatrix {kernlab}

Ktest <- kernelMatrix(kfunc, test, train)

result <- bkpc(Ktrain, y = wtr, n.iter = 1000,  thin = 10,  n.kpc = 3, 
intercept = FALSE, rotate = TRUE)

# predict

out <- predict(result, Ktest, n.burnin = 10) 

# classification rate

sum(out$class == as.numeric(wte))/dim(test)[1]
table(out$class, as.numeric(wte))


# embed data from the testing set on the new space:

KPCtest <- predict(result$kPCA, Ktest)

# new data is linearly separable in the new feature space where classification takes place
library(rgl)
plot3d(KPCtest[ , 1 :  3], col = as.numeric(wte))


# another model:  do not project the data to the principal axes of the feature space. 
# NOTE: Slow
# use Gaussian kernel with the default bandwidth parameter

Ktrain <- gaussKern(train)$K

Ktest <- gaussKern(train, test, theta = gaussKern(train)$theta)$K

resultBKMC <- bkpc(Ktrain, y = wtr, n.iter = 1000,  thin = 10,  
intercept = FALSE, rotate = FALSE)

# predict
outBKMC <- predict(resultBKMC, Ktest, n.burnin = 10)

# to compare with previous model
table(outBKMC$class, as.numeric(wte))


# another example: wine data from gclus library

library(gclus)
data(wine)

testset <- sample(1 : 178, 90)
train <- as.matrix(wine[-testset, -1])
test <- as.matrix(wine[testset, -1])

wtr <- as.factor(wine[-testset, 1])
wte <- as.factor(wine[testset, 1])

#  make training set kernel using kernelMatrix from kernlab library

kfunc <- anovadot(sigma = 1, degree = 1)
Ktrain <- kernelMatrix(kfunc, train)

#  make testing set kernel using kernelMatrix {kernlab}
Ktest <- kernelMatrix(kfunc, test, train)

result <- bkpc(Ktrain, y = wtr, n.iter = 1000,  thin = 10,  n.kpc = 3, 
intercept = FALSE, rotate = TRUE)

out <- predict(result, Ktest, n.burnin = 10) 

# classification rate in the test set
sum(out$class == as.numeric(wte))/dim(test)[1]


# embed data from the testing set on the new space:
KPCtest <- predict(result$kPCA, Ktest)

# new data is linearly separable in the new feature space where classification takes place


pairs(KPCtest[ , 1 :  3], col = as.numeric(wte), 
main =  paste("symbol = predicted class", "\n", "color = true class" ), 
pch = out$class, upper.panel = NULL)

par(xpd=TRUE)

legend('topright', levels(wte), pch = unique(out$class), 
text.col = as.numeric(unique(wte)), bty = "n")



## End(Not run)