# rgcca: Regularized Generalized Canonical Correlation Analysis... In RGCCA: Regularized and Sparse Generalized Canonical Correlation Analysis for Multiblock Data

## Description

Regularized Generalized Canonical Correlation Analysis (RGCCA) is a generalization of regularized canonical correlation analysis to three or more sets of variables. Given J matrices X_1, X_2, ..., X_J that represent J sets of variables observed on the same set of n individuals. The matrices X_1, X_2, ..., X_J must have the same number of rows, but may (and usually will) have different numbers of columns. The aim of RGCCA is to study the relationships between these J blocks of variables. It constitutes a general framework for many multi-block data analysis methods. It combines the power of multi-block data analysis methods (maximization of well identified criteria) and the flexibility of PLS path modeling (the researcher decides which blocks are connected and which are not). Hence, the use of RGCCA requires the construction (user specified) of a design matrix C, that characterize the connections between blocks. Elements of the symmetric design matrix C = (c_{jk}) is equal to 1 if block j and block k are connected, and 0 otherwise. The function rgcca() implements a monotonically convergent algorithm (i.e. the bounded criteria to be maximized increases at each step of the iterative procedure) that is very similar to the PLS algorithm proposed by Herman Wold and finds at convergence a stationnary point of the RGCCA optimization problem. . Moreover, depending on the dimensionality of each block X_j, j = 1, ..., J, the primal (when n > p_j) algorithm or the dual (when n < p_j) algorithm is used (see Tenenhaus et al. 2015). Moreover, by deflation strategy, rgcca() allow to compute several RGCCA block components (specified by ncomp) for each block. Within each block, block components are guaranteed to be orthogonal using the deflation procedure. The so-called symmetric deflation is considered in this implementation, i.e. each block is deflated with respect to its own component(s). It should be noted that the numbers of components per block can differ from one block to another.

## Usage

 ```1 2 3``` ```rgcca(A, C = 1 - diag(length(A)), tau = rep(1, length(A)), ncomp = rep(1, length(A)), scheme = "centroid", scale = TRUE, init = "svd", bias = TRUE, tol = 1e-08, verbose = TRUE) ```

## Arguments

 `A` A list that contains the J blocks of variables X_1, X_2, ..., X_J. `C` A design matrix that describes the relationships between blocks (default: complete design). `tau` tau is either a 1 * J vector or a max(ncomp) * J matrix, and contains the values of the shrinkage parameters (default: tau = 1, for each block and each dimension). If tau = "optimal" the shrinkage paramaters are estimated for each block and each dimension using the Schafer and Strimmer (2005) analytical formula . If tau is a 1* J numeric vector, tau[j] is identical across the dimensions of block X_j. If tau is a matrix, tau[k, j] is associated with X_{jk} (kth residual matrix for block j) `ncomp` A 1 * J vector that contains the numbers of components for each block (default: rep(1, length(A)), which gives one component per block.) `scheme` The value is "horst", "factorial", "centroid" or any diffentiable convex scheme function g designed by the user (default: "centroid"). `scale` If scale = TRUE, each block is standardized to zero means and unit variances and then divided by the square root of its number of variables (default: TRUE). `init` The mode of initialization to use in RGCCA algorithm. The alternatives are either by Singular Value Decompostion ("svd") or random ("random") (Default: "svd"). `bias` A logical value for biaised or unbiaised estimator of the var/cov (default: bias = TRUE). `tol` The stopping value for convergence. `verbose` If verbose = TRUE, the progress will be report while computing (default: TRUE).

## Value

 `Y` A list of J elements. Each element of Y is a matrix that contains the RGCCA components for the corresponding block. `a` A list of J elements. Each element of a is a matrix that contains the outer weight vectors for each block. `astar` A list of J elements. Each element of astar is a matrix defined as Y[[j]][, h] = A[[j]]%*%astar[[j]][, h]. `C` A design matrix that describes the relation between blocks (user specified). `tau` A vector or matrix that contains the values of the shrinkage parameters applied to each block and each dimension (user specified). `scheme` The scheme chosen by the user (user specified). `ncomp` A 1 * J vector that contains the numbers of components for each block (user specified). `crit` A vector that contains the values of the criteria across iterations. `primal_dual` A 1 * J vector that contains the formulation ("primal" or "dual") applied to each of the J blocks within the RGCCA alogrithm `AVE` indicators of model quality based on the Average Variance Explained (AVE): AVE(for one block), AVE(outer model), AVE(inner model).

## References

Tenenhaus M., Tenenhaus A. and Groenen PJF (2017), Regularized generalized canonical correlation analysis: A framework for sequential multiblock component methods, Psychometrika, in press

Tenenhaus A., Philippe C., & Frouin V. (2015). Kernel Generalized Canonical Correlation Analysis. Computational Statistics and Data Analysis, 90, 114-131.

Tenenhaus A. and Tenenhaus M., (2011), Regularized Generalized Canonical Correlation Analysis, Psychometrika, Vol. 76, Nr 2, pp 257-284.

Schafer J. and Strimmer K., (2005), A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79``` ```############# # Example 1 # ############# data(Russett) X_agric =as.matrix(Russett[,c("gini","farm","rent")]) X_ind = as.matrix(Russett[,c("gnpr","labo")]) X_polit = as.matrix(Russett[ , c("demostab", "dictator")]) A = list(X_agric, X_ind, X_polit) #Define the design matrix (output = C) C = matrix(c(0, 0, 1, 0, 0, 1, 1, 1, 0), 3, 3) result.rgcca = rgcca(A, C, tau = c(1, 1, 1), scheme = "factorial", scale = TRUE) lab = as.vector(apply(Russett[, 9:11], 1, which.max)) plot(result.rgcca\$Y[], result.rgcca\$Y[], col = "white", xlab = "Y1 (Agric. inequality)", ylab = "Y2 (Industrial Development)") text(result.rgcca\$Y[], result.rgcca\$Y[], rownames(Russett), col = lab, cex = .7) ############# # Example 2 # ############# data(Russett) X_agric =as.matrix(Russett[,c("gini","farm","rent")]) X_ind = as.matrix(Russett[,c("gnpr","labo")]) X_polit = as.matrix(Russett[ , c("inst", "ecks", "death", "demostab", "dictator")]) A = list(X_agric, X_ind, X_polit, cbind(X_agric, X_ind, X_polit)) #Define the design matrix (output = C) C = matrix(c(0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0), 4, 4) result.rgcca = rgcca(A, C, tau = c(1, 1, 1, 0), ncomp = rep(2, 4), scheme = function(x) x^4, scale = TRUE) # HPCA lab = as.vector(apply(Russett[, 9:11], 1, which.max)) plot(result.rgcca\$Y[][, 1], result.rgcca\$Y[][, 2], col = "white", xlab = "Global Component 1", ylab = "Global Component 2") text(result.rgcca\$Y[][, 1], result.rgcca\$Y[][, 2], rownames(Russett), col = lab, cex = .7) ## Not run: ###################################### # example 3: RGCCA and leave one out # ###################################### Ytest = matrix(0, 47, 3) X_agric =as.matrix(Russett[,c("gini","farm","rent")]) X_ind = as.matrix(Russett[,c("gnpr","labo")]) X_polit = as.matrix(Russett[ , c("demostab", "dictator")]) A = list(X_agric, X_ind, X_polit) #Define the design matrix (output = C) C = matrix(c(0, 0, 1, 0, 0, 1, 1, 1, 0), 3, 3) result.rgcca = rgcca(A, C, tau = rep(1, 3), ncomp = rep(1, 3), scheme = "factorial", verbose = TRUE) for (i in 1:nrow(Russett)){ B = lapply(A, function(x) x[-i, ]) B = lapply(B, scale2) resB = rgcca(B, C, tau = rep(1, 3), scheme = "factorial", scale = FALSE, verbose = FALSE) # look for potential conflicting sign among components within the loo loop. for (k in 1:length(B)){ if (cor(result.rgcca\$a[[k]], resB\$a[[k]]) >= 0) resB\$a[[k]] = resB\$a[[k]] else resB\$a[[k]] = -resB\$a[[k]] } Btest =lapply(A, function(x) x[i, ]) Btest[]=(Btest[]-attr(B[],"scaled:center")) / (attr(B[],"scaled:scale"))/sqrt(NCOL(B[])) Btest[]=(Btest[]-attr(B[],"scaled:center")) / (attr(B[],"scaled:scale"))/sqrt(NCOL(B[])) Btest[]=(Btest[]-attr(B[],"scaled:center")) / (attr(B[],"scaled:scale"))/sqrt(NCOL(B[])) Ytest[i, 1] = Btest[]%*%resB\$a[] Ytest[i, 2] = Btest[]%*%resB\$a[] Ytest[i, 3] = Btest[]%*%resB\$a[] } lab = apply(Russett[, 9:11], 1, which.max) plot(result.rgcca\$Y[], result.rgcca\$Y[], col = "white", xlab = "Y1 (Agric. inequality)", ylab = "Y2 (Ind. Development)") text(result.rgcca\$Y[], result.rgcca\$Y[], rownames(Russett), col = lab, cex = .7) text(Ytest[, 1], Ytest[, 2], substr(rownames(Russett), 1, 1), col = lab, cex = .7) ## End(Not run) ```

### Example output       ```Computation of the RGCCA block components based on the factorial scheme
Shrinkage intensity paramaters are chosen manually
Iter:    1  Fit: 1.01185901  Dif:  0.05582112
Iter:    2  Fit: 1.01187179  Dif:  0.00001278
Iter:    3  Fit: 1.01187179  Dif:  0.00000000
The RGCCA algorithm converged to a stationary point after 2 iterations
Computation of the RGCCA block components based on the g scheme
Shrinkage intensity paramaters are chosen manually
Computation of the RGCCA block components #1 is under progress...
Iter:    1  Fit: 1.80671549  Dif:  0.34314614
Iter:    2  Fit: 1.89149450  Dif:  0.08477900
Iter:    3  Fit: 1.90180194  Dif:  0.01030744
Iter:    4  Fit: 1.90263166  Dif:  0.00082972
Iter:    5  Fit: 1.90268926  Dif:  0.00005761
Iter:    6  Fit: 1.90269315  Dif:  0.00000389
Iter:    7  Fit: 1.90269342  Dif:  0.00000026
Iter:    8  Fit: 1.90269344  Dif:  0.00000002
Iter:    9  Fit: 1.90269344  Dif:  0.00000000
The RGCCA algorithm converged to a stationary point after 8 iterations
Computation of the RGCCA block components #2 is under progress ...
Iter:    1  Fit: 0.14950453  Dif:  0.14089543
Iter:    2  Fit: 0.15375967  Dif:  0.00425515
Iter:    3  Fit: 0.15384852  Dif:  0.00008885
Iter:    4  Fit: 0.15384939  Dif:  0.00000087
Iter:    5  Fit: 0.15384940  Dif:  0.00000001
The RGCCA algorithm converged to a stationary point after 4 iterations
Computation of the RGCCA block components based on the factorial scheme
Shrinkage intensity paramaters are chosen manually
Iter:    1  Fit: 1.01185901  Dif:  0.05582112
Iter:    2  Fit: 1.01187179  Dif:  0.00001278
Iter:    3  Fit: 1.01187179  Dif:  0.00000000
The RGCCA algorithm converged to a stationary point after 2 iterations
```

RGCCA documentation built on May 2, 2019, 3:39 p.m.