GRCCA: Canonical Correlation analysis with group L2 penalty

Description Usage Arguments Value Examples

View source: R/GRCCA.R

Description

GRCCA function performs Canonical Correlation Analysis with group L2 regularization and allows to conduct Canonical Correlation Analysis in high dimensions. It imposes group L2 penalty on the coefficient vectors α and β coefficients. Specifically, if

x = (x_1, ..., x_p) and y = (y_1, ..., y_q)

are random vectors and

I_1, ..., I_K is a partition of {1, ..., p} and J_1, ..., J_L is a partition of {1, ..., q}

then GRCCA seeks for such vectors

α = (α_1, ..., α_p) and β = (β_1, ..., β_q)

that satisfy two within group constraints

||α||_w = var(α_I_1) + ... + var(α_I_K) <= t_1

and

||β||_w = var(β_J_1) + ... + var(β_J_L) <= t_2

as well as two between group constraints

||α||_b = |I_1| mean(α_I_1)^2 + ... + |I_K| mean(α_I_K)^2 <= s_1

and

||β||_b = |J_1| mean(β_J_1)^2 + ... + |J_L| mean(β_J_L)^2 <= s_2

and that maximize the correlation cor(u, v) between the linear combnations

u = <x , α> and v = <y , β>.

Here <a , b> refers to the inner product between two vectors;

α_I_k and β_J_l

are corresponding subvectors of α and β with indices belonging to I_k and J_l, respectively; and |A| referes to the set candinality. The above optimization problem is equivalet to maximizing the modified correlation coefficient

cov(<x , α>, <y , β>) / ( cov(<x , α>) + λ_1 ||α||_w + μ_1 ||α||_b )^1/2 ( var(<y , β>) + λ_2 ||β||_w + μ_2 ||β||_b )^1/2,

where

λ_1 and λ_2

control the resulting within group variation of the coefficiens and

μ_1 and μ_2

control the sparsity on a group level of the canonical coefficients α and β.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
GRCCA(
  X,
  Y,
  group1 = rep(1, ncol(X)),
  group2 = rep(1, ncol(Y)),
  lambda1 = 0,
  lambda2 = 0,
  mu1 = 0,
  mu2 = 0
)

Arguments

X

a rectangular n x p matrix containing n observations of random vector x.

Y

a rectangular n x q matrix containing n observations of random vector y.

group1

an integer valued vector representing the group assignment of α coefficients. By default group1 = rep(1, ncol(X)), i.e. we include all α coefficients in the same group.

group2

an integer valued vector representing the group assignment of β coefficients. By default group2 = rep(1, ncol(Y)), i.e. we include all β coefficients in the same group.

lambda1

a non-negative penalty factor used for controlling the within variation of α coefficients. By default lambda1 = 0, i.e. no regularization is imposed. Increasing lambda1 shrinks each coefficient toward it's group mean.

lambda2

a non-negative penalty factor used for controlling the within variation of β coefficients. By default lambda2 = 0, i.e. no regularization is imposed. Increasing lambda2 shrinks each coefficient toward it's group mean.

mu1

a non-negative penalty factor used for controlling the between variation of α coefficients. By default mu1 = 0, i.e. no regularization is imposed. Increasing mu1 shrinks each group mean toward zero.

mu2

a non-negative penalty factor used for controlling the between variation of β coefficients. By default mu2 = 0, i.e. no regularization is imposed. Increasing mu2 shrinks each group mean toward zero.

Value

A list containing the PCMS problem solution:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
data(X)
data(Y)
#run RCCA
n.groups = 5
#run GRCCA with no sparsity on a group level
group1 = rep(1:n.groups, rep(ncol(X)/n.groups, n.groups))
grcca = GRCCA(X, Y, group1, group2 = NULL, lambda1 = 1000, lambda2 = 0, mu1 = 0, mu2 = 0)
#check the modified canonical correlations 
plot(1:grcca$n.comp, grcca$mod.cors, pch = 16, xlab = 'component', 'ylab' = 'correlation', ylim = c(0, 1))
#check the canonical coefficients for the first canonical variates
barplot(grcca$x.coefs[,'can.comp1'], col = 'orange', 'xlab' = 'X feature', ylab = 'value')
n.groups = 50
#run GRCCA with sparsity on a group level
group1 = rep(1:n.groups, rep(ncol(X)/n.groups, n.groups))
grcca = GRCCA(X, Y, group1, group2 = NULL, lambda1 = 10000, lambda2 = 0, mu1 = 100, mu2 = 0)
#check the modified canonical correlations 
plot(1:grcca$n.comp, grcca$mod.cors, pch = 16, xlab = 'component', 'ylab' = 'correlation', ylim = c(0, 1))
#check the canonical coefficients for the first canonical variates
barplot(grcca$x.coefs[,'can.comp1'], col = 'orange', 'xlab' = 'X feature', ylab = 'value')

ElenaTuzhilina/RCCA documentation built on July 11, 2021, 6:09 p.m.