RCCA: Canonical Correlation analysis with L2 penalty

Description Usage Arguments Value Examples

View source: R/RCCA.R

Description

RCCA function performs Canonical Correlation Analysis with L2 regularization and allows to conduct Canonical Correlation Analysis in high dimensions For a pair of random vectors

x = (x_1, ..., x_p) and y = (y_1, ..., y_q)

it seeks for such vectors

α = (α_1, ..., α_p) and β = (β_1, ..., β_q)

that satisfy L2 constraints

||α|| <= t_1 and ||β|| <= t_2

and that maximize the correlation cor(u, v) between the linear combnations

u = <x , α> and v = <y , β>.

Here <a , b> refers to the inner product between two vectors. The optimal values for α and β are called canonical coefficients and the resulting linear combinations u and v are called canonical variates. It is actually possible to continue the process and find a sequence of canonical coefficients

α[1], ..., α[k] and β[1], ..., β[k]

that satisfy the L2 constraints and such that linear combinations

u[i] = <x , α[i]> and v[i] = <y , β[i]>

form two sets

{u[1], ..., u[k]} and {v[k], ..., v[k]}

of independent random variables. The maximmum possible number of such canonical variates is k = min(p, q). Note that the above optimization problem is equivalet to maximizing the modified correlation coefficient

cov(<x , α>, <y , β>) / ( cov(<x , α>) + λ_1 ||α||^2 )^1/2 ( var(<y , β>) + λ_2 ||β||^2 )^1/2,

where

λ_1 and λ_2

control the resulting sparsity of the canonical coefficients.

Usage

1
RCCA(X, Y, lambda1 = 0, lambda2 = 0)

Arguments

X

a rectangular n x p matrix containing n observations of random vector x.

Y

a rectangular n x q matrix containing n observations of random vector y.

lambda1

a non-negative penalty factor used for regularizing X side coefficients. By default lambda1 = 0, i.e. no regularization is imposed. Increasing lambda1 incourages sparsity of the resulting canonical coefficients.

lambda2

a non-negative penalty factor used for regularizing Y side coefficients. By default lambda2 = 0, i.e. no regularization is imposed. Increasing lambda2 incourages sparsity of the resulting canonical coefficients.

Value

A list containing the PCMS problem solution:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
data(X)
data(Y)
#run RCCA 
rcca = RCCA(X, Y, lambda1 = 10, lambda2 = 0)
#check the modified canonical correlations 
plot(1:rcca$n.comp, rcca$mod.cors, pch = 16, xlab = 'component', 'ylab' = 'correlation', ylim = c(0, 1))
#check the canonical correlations
points(1:rcca$n.comp, rcca$cors, pch = 16, col = 'purple')
#compare them to cor(x*alpha, y*beta)
points(1:rcca$n.comp, diag(cor(X %*% rcca$x.coefs, Y %*% rcca$y.coefs)), col = 'cyan', pch = 16, cex = 0.7)
#check the canonical coefficients for the first canonical variates
barplot(rcca$x.coefs[,'can.comp1'], col = 'orange', 'xlab' = 'X feature', ylab = 'value')
barplot(rcca$y.coefs[,'can.comp1'], col = 'darkgreen', 'xlab' = 'Y feature', ylab = 'value')

ElenaTuzhilina/RCCA documentation built on July 11, 2021, 6:09 p.m.