Canonical Correlation Analysis based on the Hilbert-Schmidt Independence Criterion.

Description

Given two multi-dimensional data sets, find a pair of canonical projection pairs that maximizes the HSIC criterion. Called by hsicCCA, and intended for internal use, but users may play with it for potential finer controls.

Usage

1
hsicCCAfunc(x, y, Wx = NULL, Wy = NULL, sigmax, sigmay, numiter = 20, reltolstop = 1e-04)

Arguments

x

The x-variable data set. One row per observation.

y

The y-variable data set. One row per observation.

Wx

Initial projection vector for the x data set. Randomly set if NULL.

Wy

Initial projection vector for the y data set. Randomly set if NULL.

sigmax

The bandwidth parameter for the Gaussian kernel on the x-variable set. A positive value. The smaller the smoother.

sigmay

The bandwidth parameter for the Gaussian kernel on the y-variable set. A positive value. The smaller the smoother.

numiter

Maximum number of iterations.

reltolstop

Convergence threshold. Algorithm stops when relative changes in cost from consecutive iterations is less than the threshold.

Details

Optimization is done by gradient descent, where Nelder-Mead is used for step-size selection. Nelder Mead may fail to increase the cost at times (when stuck at local minima). User may consider restarting the algorithm when this happens.

Value

A list containing:

Wx

The canoncial projection vector for the x-variable set.

Wy

The canoncial projection vector for the y-variable set.

cost

A vector of (negative) cost values at each iteration.

Note

Current implementation is slow and requires high storage for large sample data. Sample size > 2000 not recommended.

Author(s)

Billy Chang

References

Chang et. al. (2013) Canonical Correlation Analysis based on Hilbert-Schmidt Independence Criterion and Centered Kernel Target Alignment. ICML 2013.

Gretton et. al. (2005) Measuring statistical dependence with Hilbert-Schmidt Norm. In Algorithmic Learning Theory 2005.

See Also

hsicCCA

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
set.seed(1)
numData <- 100
numDim <- 2
x <- matrix(rnorm(numData*numDim),numData,numDim)
y <- matrix(rnorm(numData*numDim),numData,numDim)
z <- runif(numData,-pi,pi)
y[,1] <- cos(z)+rnorm(numData,sd=0.1); x[,1] <- sin(z)+rnorm(numData,sd=0.1)
x <- scale(x)
y <- scale(y)

fit <- hsicCCAfunc(x,y,sigmax=1,sigmay=1)
plot(x%*%fit$Wx,y%*%fit$Wy)