wrapper.rgcca: mixOmics wrapper for Regularised Generalised Canonical...
In mixOmics: Omics Data Integration Project

Description Usage Arguments Details Value Author(s) References See Also Examples

Wrapper function to perform Regularized Generalised Canonical Correlation Analysis (rGCCA), a generalised approach for the integration of multiple datasets. For more details, see the help(rgcca) from the RGCCA package.

wrapper.rgcca(
  X,
  design = 1 - diag(length(X)),
  tau = rep(1, length(X)),
  ncomp = 1,
  keepX,
  scheme = "horst",
  scale = TRUE,
  init = "svd.single",
  tol = .Machine$double.eps,
  max.iter = 1000,
  near.zero.var = FALSE,
  all.outputs = TRUE
)

`X`	a list of data sets (called 'blocks') matching on the same samples. Data in the list should be arranged in samples x variables. `NA`s are not allowed.
`design`	numeric matrix of size (number of blocks in X) x (number of blocks in X) with values between 0 and 1. Each value indicates the strenght of the relationship to be modelled between two blocks using sGCCA; a value of 0 indicates no relationship, 1 is the maximum value. If `Y` is provided instead of `indY`, the `design` matrix is changed to include relationships to `Y`.
`tau`	numeric vector of length the number of blocks in `X`. Each regularization parameter will be applied on each block and takes the value between 0 (no regularisation) and 1. If tau = "optimal" the shrinkage paramaters are estimated for each block and each dimension using the Schafer and Strimmer (2005) analytical formula.
`ncomp`	the number of components to include in the model. Default to 1.
`keepX`	A vector of same length as X. Each entry keepX[i] is the number of X[[i]]-variables kept in the model.
`scheme`	Either "horst", "factorial" or "centroid" (Default: "horst").
`scale`	Logical. If scale = TRUE, each block is standardized to zero means and unit variances (default: TRUE)
`init`	Mode of initialization use in the algorithm, either by Singular Value Decompostion of the product of each block of X with Y ("svd") or each block independently ("svd.single") . Default to "svd.single".
`tol`	Convergence stopping value.
`max.iter`	integer, the maximum number of iterations.
`near.zero.var`	Logical, see the internal `nearZeroVar` function (should be set to TRUE in particular for data with many zero values). Setting this argument to FALSE (when appropriate) will speed up the computations. Default value is FALSE
`all.outputs`	Logical. Computation can be faster when some specific (and non-essential) outputs are not calculated. Default = `TRUE`.

This wrapper function performs rGCCA (see RGCCA) with 1, … ,ncomp components on each block data set. A supervised or unsupervised model can be run. For a supervised model, the unmap function should be used as an input data set. More details can be found on the package RGCCA.

wrapper.rgcca returns an object of class "rgcca", a list that contains the following components:

`data`	the input data set (as a list).
`design`	the input design.
`variates`	the sgcca components.
`loadings`	the loadings for each block data set (outer wieght vector).
`loadings.star`	the laodings, standardised.
`tau`	the input tau parameter.
`scheme`	the input schme.
`ncomp`	the number of components included in the model for each block.
`crit`	the convergence criterion.
`AVE`	Indicators of model quality based on the Average Variance Explained (AVE): AVE(for one block), AVE(outer model), AVE(inner model)..
`names`	list containing the names to be used for individuals and variables.

More details can be found in the references.

Arthur Tenenhaus, Vincent Guillemot, Kim-Anh Lê Cao, Florian Rohart, Benoit Gautier

Tenenhaus A. and Tenenhaus M., (2011), Regularized Generalized Canonical Correlation Analysis, Psychometrika, Vol. 76, Nr 2, pp 257-284.

Schafer J. and Strimmer K., (2005), A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Statist. Appl. Genet. Mol. Biol. 4:32.

wrapper.rgcca, plotIndiv, plotVar, wrapper.sgcca and http://www.mixOmics.org for more details.

data(nutrimouse)
# need to unmap the Y factor diet
Y = unmap(nutrimouse$diet)
data = list(gene = nutrimouse$gene, lipid = nutrimouse$lipid, Y = Y)
# with this design, gene expression and lipids are connected to the diet factor
# design = matrix(c(0,0,1,
#                   0,0,1,
#                   1,1,0), ncol = 3, nrow = 3, byrow = TRUE)

# with this design, gene expression and lipids are connected to the diet factor
# and gene expression and lipids are also connected
design = matrix(c(0,1,1,
1,0,1,
1,1,0), ncol = 3, nrow = 3, byrow = TRUE)
#note: the tau parameter is the regularization parameter
wrap.result.rgcca = wrapper.rgcca(X = data, design = design, tau = c(1, 1, 0),
ncomp = 2,
scheme = "centroid")
#wrap.result.rgcca