We consider J data matrices X1 ,..., XJ. Each n × pj data matrix Xj = [ xj1, ..., xjpj ] is called a block and represents a set of pj variables observed on n individuals. The number and the nature of the variables may differ from one block to another, but the individuals must be the same across blocks. We assume that all variables are centered. The objective of RGCCA is to find, for each block, a weighted composite of variables (called block component) yj = Xj . aj, j = 1 ,..., J (where aj is a column-vector with pj elements) summarizing the relevant information between and within the blocks. The block components are obtained such that (i) block components explain well their own block and/or (ii) block components that are assumed to be connected are highly correlated. In addition, RGCCA integrates a variable selection procedure, called SGCCA, allowing the identification of the most relevant features.
# Load functions source("../R/parsing.R") source("../R/select.type.R") source("../R/plot.R") source("../R/network.R") loadLibraries(c("RGCCA", "ggplot2", "optparse", "scales", "igraph")) # Load the data data("Russett") #Creates the blocks agriculture = Russett[, 1:3] industry = Russett[, 4:5] politic = Russett[, 6:11] # Creates optional files response = factor( apply(Russett[, 9:11], 1, which.max), labels = colnames(Russett)[9:11] ) connection = matrix(c(0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0), 4, 4) # Save the Russett files files = c("agriculture", "industry", "politic", "response", "connection") # Creates the files (in .tsv format for example) if(!file.exists("data")) dir.create("data") # Without row and column names for connection files sapply(1:length(files), function (x) { bool = ( files[x] != "connection") write.table( x = get(files[x]), file = paste("data/", files[x], ".tsv", sep=""), row.names = bool, col.names = bool, sep = "\t") }) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "images/" )
The blocks are loaded with the function setBlocks
. The first argument of this function (superblock
) required a bolean giving the presence (TRUE) / absence (FALSE) of a superblock. The second one corresponds to a character giving the list of the file path separated by a comma (argument file
). By default, the name of the blocks corresponds to those of the files (names
argument) and could be set. By default, the tabulation is used as a column separator (sep
argument) and the first row is considered as a header (header
parameter).
# A boolean giving the presence (TRUE) / absence (FALSE) of a superblock blocks = setBlocks(file = "data/agriculture.tsv,data/industry.tsv,data/politic.tsv") blocks[["Superblock"]] = Reduce(cbind, blocks)
The connection between the blocks will be used by the RGCCA and must be set by setConnection
function. A group of samples will be used to color them in the samples plot and must be set by setResponse
function. For both functions, the blocks
parameter, set at the previous step, is required. The other parameters are optional. The user could import a file containing either (file
parameter) : (i) a symmetric matrix with 1 giving a connection between two blocs, or 0 otherwise; (ii) a univariate vector (qualitative or quantitative) or a disjunctive table for the response. By default, the column separator is the tabulation and could be set (sep
argument). For the setResponse
, a header could be specified (header
parameter).
# Optional parameters RESPONSE = "data/response.tsv" CONNECTION = "data/connection.tsv" SUPERBLOCK = TRUE # Uncomment the parameters below to try without default settings # RESPONSE <- CONNECTION <- NULL response = setResponse(blocks = blocks, file = RESPONSE) connection = setConnection(blocks = blocks, file = CONNECTION)
for (x in 1:length(files)) { tab = get(files[x]) if (x > 3) colnames(tab) = rep(NULL, NCOL(tab)) pander::pandoc.table(head(tab), caption = files[x]) }
SGCCA is run from the RGCCA package by using two components for a biplot visualization. The S/RGCCA function doesn't names the blocks in their outputs. This step is required to generate biplots.
# Use two components in RGCCA to plot in a bidimensionnal space NB_COMP = c(2, 2, 3, 3) sgcca.res = rgcca.analyze(blocks = blocks, connection = connection, ncomp = NB_COMP) # Renames the elements of the list according to the block names names(sgcca.res$a) = names(blocks)
Both the samples and the variables could be visualized by using biplots functions (respectively plotSamplesSpace
and plotVariablesSpace
). Histograms are used to visualized in decreasing order the variables with the higher weights and the blocks with the higher Average Variance Explained (AVE).
These functions take the results of a sgcca or a rgcca (rgcca
parameter) and the components to visualize : either comp_x
and comp_y
for biplots or comp
for histograms. By default, comp_x
= comp
= 1 and comp_y
= 2. The presence or the absence of a superblock among the analysis could be specified for plotVariablesSpace
and plotFingerprint
to color the variables according to their blocks. By default, the last block is plotted, corresponding to the superblock if selected (i_block
parameter). plotVariablesSpace
, which is a corcircle plot, required the blocks
for the correlation with the selected component. plotVariablesSpace
could use the response variable to color the samples by groups. By default, the first 100th higher weights are used for the plotFingerprint
and could be set by using the n_mark
argument.
COMP1 = 2 COMP2 = 3 NB_MARK = 100
plotSamplesSpace(rgcca = sgcca.res, resp = response)
plotVariablesSpace(rgcca = sgcca.res, block = blocks, superblock = SUPERBLOCK)
plotFingerprint(rgcca = sgcca.res, superblock = SUPERBLOCK, n_mark = NB_MARK, type = "weight")
plotAVE(rgcca = sgcca.res)
plotSamplesSpace(rgcca = sgcca.res, resp = response, comp_x = COMP1, comp_y = COMP2, i_block = 3)
plotVariablesSpace(rgcca = sgcca.res, block = blocks, comp_x = COMP1, comp_y = COMP2, i_block = 3)
plotFingerprint(rgcca = sgcca.res, comp = COMP1, n_mark = NB_MARK, i_block = 3, type = "weight")
# Remove the temp/ folder unlink("data", recursive = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.