# rcc: Regularized Canonical Correlation Analysis In mixOmics: Omics Data Integration Project

## Description

The function performs the regularized extension of the Canonical Correlation Analysis to seek correlations between two data matrices.

## Usage

 1 2 3 4 5 6 7 8 rcc( X, Y, ncomp = 2, method = c("ridge", "shrinkage"), lambda1 = 0, lambda2 = 0 ) 

## Arguments

 X numeric matrix or data frame (n \times p), the observations on the X variables. NAs are allowed. Y numeric matrix or data frame (n \times q), the observations on the Y variables. NAs are allowed. ncomp the number of components to include in the model. Default to 2. method One of "ridge" or "shrinkage". If "ridge", lambda1 and lambda2 need to be supplied (see also our function tune.rcc); if "shrinkage", parameters are directly estimated with Strimmer's formula, see below and reference. lambda1, lambda2 a non-negative real. The regularization parameter for the X and Y data. Defaults to lambda1=lambda2=0. Only used if method="ridge"

## Details

The main purpose of Canonical Correlations Analysis (CCA) is the exploration of sample correlations between two sets of variables X and Y observed on the same individuals (experimental units) whose roles in the analysis are strictly symmetric.

The cancor function performs the core of computations but additional tools are required to deal with data sets highly correlated (nearly collinear), data sets with more variables than units by example.

The rcc function, the regularized version of CCA, is one way to deal with this problem by including a regularization step in the computations of CCA. Such a regularization in this context was first proposed by Vinod (1976), then developped by Leurgans et al. (1993). It consists in the regularization of the empirical covariances matrices of X and Y by adding a multiple of the matrix identity, that is, Cov(X)+ λ_1 I and Cov(Y)+ λ_2 I.

When lambda1=0 and lambda2=0, rcc performs a classical CCA, if possible (i.e. when n > p+q.

The shrinkage estimates method = "shrinkage" can be used to bypass tune.rcc to choose the shrinkage parameters - which can be long and costly to compute with very large data sets. Note that both functions tune.rcc (which uses cross-validation) and the shrinkage parameters (which uses the formula from Schafer and Strimmer, see the corpcor package estimate.lambda ) may output different results.

Note: when method = "shrinkage" the parameters are estimated using estimate.lambda from the corpcor package. Data are then centered to calculate the regularised variance-covariance matrices in rcc.

Missing values are handled in the function, except when using method = "shrinkage". In that case the estimation of the missing values can be performed by the reconstitution of the data matrix using the nipals function.

## Value

rcc returns a object of class "rcc", a list that contains the following components:

 X the original X data. Y the original Y data. cor a vector containing the canonical correlations. lambda a vector containing the regularization parameters whether those were input if ridge method or directly estimated with the shrinkage method. loadings list containing the estimated coefficients used to calculate the canonical variates in X and Y. variates list containing the canonical variates. names list containing the names to be used for individuals and variables.

## Author(s)

Sébastien Déjean, Ignacio González, Francois Bartolo, Kim-Anh Lê Cao, Florian Rohart, Al J Abadi

## References

González, I., Déjean, S., Martin, P. G., and Baccini, A. (2008). CCA: An R package to extend canonical correlation analysis. Journal of Statistical Software, 23(12), 1-14.

González, I., Déjean, S., Martin, P., Goncalves, O., Besse, P., and Baccini, A. (2009). Highlighting relationships between heterogeneous biological data through graphical displays based on regularized canonical correlation analysis. Journal of Biological Systems, 17(02), 173-199.

Leurgans, S. E., Moyeed, R. A. and Silverman, B. W. (1993). Canonical correlation analysis when the data are curves. Journal of the Royal Statistical Society. Series B 55, 725-740.

Vinod, H. D. (1976). Canonical ridge and econometrics of joint production. Journal of Econometrics 6, 129-137.

Opgen-Rhein, R., and K. Strimmer. 2007. Accurate ranking of differentially expressed genes by a distribution-free shrinkage approach. Statist. emphAppl. Genet. Mol. Biol. 6:9. (http://www.bepress.com/sagmb/vol6/iss1/art9/)

Sch"afer, J., and K. Strimmer. 2005. A shrinkage approach to large-scale covariance estimation and implications for functional genomics. Statist. emphAppl. Genet. Mol. Biol. 4:32. (http://www.bepress.com/sagmb/vol4/iss1/art32/)

summary, tune.rcc, plot.rcc, plotIndiv, plotVar, cim, network and http://www.mixOmics.org for more details.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ## Classic CCA data(linnerud) X <- linnerud$exercise Y <- linnerud$physiological linn.res <- rcc(X, Y) ## Not run: ## Regularized CCA data(nutrimouse) X <- nutrimouse$lipid Y <- nutrimouse$gene nutri.res1 <- rcc(X, Y, ncomp = 3, lambda1 = 0.064, lambda2 = 0.008) ## using shrinkage parameters nutri.res2 <- rcc(X, Y, ncomp = 3, method = 'shrinkage') nutri.res2\$lambda # the shrinkage parameters ## End(Not run)