PRCCA: Canonical Correlation analysis with partial L2 penalty

Description Usage Arguments Value Examples

View source: R/PRCCA.R

Description

PRCCA function performs Canonical Correlation Analysis with partial L2 regularization and allows to conduct Canonical Correlation Analysis in high dimensions. It imposes L2 penalty only on a subset of α and β coefficients. Specifically, if

x = (x_1, ..., x_p) and y = (y_1, ..., y_q)

are random vectors and

I = {i_1, ..., i_m} is a subset of {1, ..., p} and J = {j_1, ..., i_r} is a subset of {1, ..., q}

then PRCCA seeks for such vectors

α = (α_1, ..., α_p) and β = (β_1, ..., β_q)

that satisfy partial L2 constraints

||α_I|| <= t_1 and ||β_J|| <= t_2

and that maximize the correlation cor(u, v) between the linear combnations

u = <x , α> and v = <y , β>.

Here <a , b> refers to the inner product between two vectors and

α_I and β_J

are corresponding subvectors of α and β with indices belonging to I and J, respectively. Again, the above optimization problem is equivalet to maximizing the modified correlation coefficient

cov(<x , α>, <y , β>) / ( cov(<x , α>) + λ_1 ||α_I||^2 )^1/2 ( var(<y , β>) + λ_2 ||β_J||^2 )^1/2,

where

λ_1 and λ_2

control the resulting sparsity of the canonical coefficients within

α_I and β_J

parts of the coefficient vectors.

Usage

1
PRCCA(X, Y, index1 = 1:ncol(X), index2 = 1:ncol(Y), lambda1 = 0, lambda2 = 0)

Arguments

X

a rectangular n x p matrix containing n observations of random vector x.

Y

a rectangular n x q matrix containing n observations of random vector y.

index1

a subset of indices the penalty is imposed on while regularizing the X side. By default index1 = 1:ncol(X), i.e. we include all α coefficients in the relularization term.

index2

a subset of indices the penalty is imposed on while regularizing the Y side. By default index2 = 1:ncol(Y), i.e. we include all β coefficients in the relularization term.

lambda1

a non-negative penalty factor used for regularizing X side coefficients α. By default lambda1 = 0, i.e. no regularization is imposed. Increasing lambda1 incourages sparsity of the resulting canonical coefficients.

lambda2

a non-negative penalty factor used for regularizing Y side coefficients β. By default lambda2 = 0, i.e. no regularization is imposed. Increasing lambda2 incourages sparsity of the resulting canonical coefficients.

Value

A list containing the PCMS problem solution:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
data(X)
data(Y)
#run RCCA 
prcca = PRCCA(X, Y, lambda1 = 100, lambda2 = 0, index1 = 1:(ncol(X) - 10))
#check the modified canonical correlations 
plot(1:prcca$n.comp, prcca$mod.cors, pch = 16, xlab = 'component', 'ylab' = 'correlation', ylim = c(0, 1))
#check the canonical correlations
points(1:prcca$n.comp, prcca$cors, pch = 16, col = 'purple')
#compare them to cor(x*alpha, y*beta)
points(1:prcca$n.comp, diag(cor(X %*% prcca$x.coefs, Y %*% prcca$y.coefs)), col = 'cyan', pch = 16, cex = 0.7)
#check the canonical coefficients for the first canonical variates
barplot(prcca$x.coefs[,'can.comp1'], col = 'orange', 'xlab' = 'X feature', ylab = 'value')
barplot(prcca$y.coefs[,'can.comp1'], col = 'darkgreen', 'xlab' = 'Y feature', ylab = 'value')

ElenaTuzhilina/RCCA documentation built on July 11, 2021, 6:09 p.m.