hsicCCA: Canonical Correlation Analysis based on the Hilbert-Schmidt...
In hsicCCA: Canonical Correlation Analysis based on Kernel Independence Measures

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Given two multi-dimensional data sets, find pairs of canonical projection pairs that maximize the HSIC criterion.

1	hsicCCA(x, y, M, sigmax = NULL, sigmay = NULL, numrepeat = 5, numiter = 100, reltolstop = 1e-04)

`x`	The x-variable data matrix. One row per observation.
`y`	The y-variable data matrix. One row per observation.
`M`	Number of canonical projection pairs to extract.
`sigmax`	The bandwidth parameter for the Gaussian kernel on the x-variable set. A positive value. The smaller the smoother. If NULL, set to median(dist(x)), and will be updated automatically for extracting different pairs of canonical projection.
`sigmay`	The bandwidth parameter for the Gaussian kernel on the y-variable set. A positive value. The smaller the smoother. If NULL, set to median(dist(y)), and will be updated automatically for extracting different pairs of canonical projection.
`numrepeat`	Number of random restarts.
`numiter`	Maximum number of iterations for extracting each pair of canonical projections.
`reltolstop`	Convergence threshold. Algorithm stops when relative change in cost from consecutive iterations is less than the threshold and will then move on to find the next pair of canonical vectors.

Optimization is done by gradient descent, where Nelder-Mead is used for step-size selection. Nelder Mead may fail to increase the cost at times (when stuck at local minima). User may consider restarting the algorithm when this happens.

A list containing:

`Wx`	The M canoncial projection vectors for the x-variable set. Each column corresponds to a projection vector.
`Wy`	The M canoncial projection vectors for the y-variable set. Each column corresponds to a projection vector.

Current implementation is slow and requires high storage for large sample data. Sample size > 2000 not recommended.

Billy Chang

Chang et. al. (2013) Canonical Correlation Analysis based on Hilbert-Schmidt Independence Criterion and Centered Kernel Target Alignment. ICML 2013.

Gretton et. al. (2005) Measuring statistical dependence with Hilbert-Schmidt Norm. In Algorithmic Learning Theory 2005.

ktaCCA, hsicCCAfunc

set.seed(1)
numData <- 100
numDim <- 3
x <- matrix(rnorm(numData*numDim),numData,numDim)
y <- matrix(rnorm(numData*numDim),numData,numDim)
z <- runif(numData,-pi,pi)
y[,1] <- cos(z)+rnorm(numData,sd=0.1); x[,1] <- sin(z)+rnorm(numData,sd=0.1)
y[,2] <- x[,2]+rnorm(numData,sd=0.5)
x <- scale(x)
y <- scale(y)

fit <- hsicCCA(x,y,2,numrepeat=2,numiter=10)
par(mfrow=c(1,2))
for (K in 1:2) plot(x%*%fit$Wx[,K],y%*%fit$Wy[,K])