Description Usage Arguments Value Examples
PRCCA function performs Canonical Correlation Analysis with partial L2 regularization and allows to conduct Canonical Correlation Analysis in high dimensions. It imposes L2 penalty only on a subset of α and β coefficients. Specifically, if
x = (x_1, ..., x_p) and y = (y_1, ..., y_q)
are random vectors and
I = {i_1, ..., i_m} is a subset of {1, ..., p} and J = {j_1, ..., i_r} is a subset of {1, ..., q}
then PRCCA seeks for such vectors
α = (α_1, ..., α_p) and β = (β_1, ..., β_q)
that satisfy partial L2 constraints
||α_I|| <= t_1 and ||β_J|| <= t_2
and that maximize the correlation cor(u, v) between the linear combnations
u = <x , α> and v = <y , β>.
Here <a , b> refers to the inner product between two vectors and
α_I and β_J
are corresponding subvectors of α and β with indices belonging to I and J, respectively. Again, the above optimization problem is equivalet to maximizing the modified correlation coefficient
cov(<x , α>, <y , β>) / ( cov(<x , α>) + λ_1 ||α_I||^2 )^1/2 ( var(<y , β>) + λ_2 ||β_J||^2 )^1/2,
where
λ_1 and λ_2
control the resulting sparsity of the canonical coefficients within
α_I and β_J
parts of the coefficient vectors.
1 |
X |
a rectangular n x p matrix containing n observations of random vector x. |
Y |
a rectangular n x q matrix containing n observations of random vector y. |
index1 |
a subset of indices the penalty is imposed on while regularizing the X side. By default |
index2 |
a subset of indices the penalty is imposed on while regularizing the Y side. By default |
lambda1 |
a non-negative penalty factor used for regularizing X side coefficients α. By default |
lambda2 |
a non-negative penalty factor used for regularizing Y side coefficients β. By default |
A list containing the PCMS problem solution:
n.comp
– the number of computed canonical components, i.e. k = min(p, q).
cors
– the resulting k canonical correlations.
mod.cors
– the resulting k values of modified canonical correlation.
x.coefs
– p x k matrix representing k canonical coefficient vectors α[1], ..., α[k].
x.vars
– n x k matrix representing k canonical variates u[1], ..., u[k].
y.coefs
– q x k matrix representing k canonical coefficient vectors β[1], ..., β[k].
y.vars
– n x k matrix representing k canonical variates v[1], ..., v[k].
1 2 3 4 5 6 7 8 9 10 11 12 13 | data(X)
data(Y)
#run RCCA
prcca = PRCCA(X, Y, lambda1 = 100, lambda2 = 0, index1 = 1:(ncol(X) - 10))
#check the modified canonical correlations
plot(1:prcca$n.comp, prcca$mod.cors, pch = 16, xlab = 'component', 'ylab' = 'correlation', ylim = c(0, 1))
#check the canonical correlations
points(1:prcca$n.comp, prcca$cors, pch = 16, col = 'purple')
#compare them to cor(x*alpha, y*beta)
points(1:prcca$n.comp, diag(cor(X %*% prcca$x.coefs, Y %*% prcca$y.coefs)), col = 'cyan', pch = 16, cex = 0.7)
#check the canonical coefficients for the first canonical variates
barplot(prcca$x.coefs[,'can.comp1'], col = 'orange', 'xlab' = 'X feature', ylab = 'value')
barplot(prcca$y.coefs[,'can.comp1'], col = 'darkgreen', 'xlab' = 'Y feature', ylab = 'value')
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.