network: Relevance Network for (r)CCA and (s)PLS regression

View source: R/network.R

networkR Documentation

Relevance Network for (r)CCA and (s)PLS regression

Description

Display relevance associations network for (regularized) canonical correlation analysis and (sparse) PLS regression. The function avoids the intensive computation of Pearson correlation matrices on large data set by calculating instead a pair-wise similarity matrix directly obtained from the latent components of our integrative approaches (CCA, PLS, block.pls methods). The similarity value between a pair of variables is obtained by calculating the sum of the correlations between the original variables and each of the latent components of the model. The values in the similarity matrix can be seen as a robust approximation of the Pearson correlation (see González et al. 2012 for a mathematical demonstration and exact formula). The advantage of relevance networks is their ability to simultaneously represent positive and negative correlations, which are missed by methods based on Euclidean distances or mutual information. Those networks are bipartite and thus only a link between two variables of different types can be represented. The network can be saved in a .glm format using the igraph package, the function write.graph and extracting the output object$gR, see details.

Usage

network(
  mat,
  comp = NULL,
  blocks = c(1, 2),
  cutoff = 0,
  row.names = TRUE,
  col.names = TRUE,
  block.var.names = TRUE,
  graph.scale = 0.5,
  size.node = 0.5,
  color.node = NULL,
  shape.node = NULL,
  alpha.node = 0.85,
  cex.node.name = NULL,
  color.edge = color.GreenRed(100),
  lty.edge = "solid",
  lwd.edge = 1,
  show.edge.labels = FALSE,
  cex.edge.label = 1,
  show.color.key = TRUE,
  symkey = TRUE,
  keysize = c(1, 1),
  keysize.label = 1,
  breaks,
  interactive = FALSE,
  layout.fun = NULL,
  save = NULL,
  name.save = NULL,
  plot.graph = TRUE
)

Arguments

mat

numeric matrix of values to be represented. Alternatively, an object from one of the following models: mix_pls, plsda, mixo_spls, splsda, rcc, sgcca, rgcca, sgccda.

comp

atomic or vector of positive integers. The components to adequately account for the data association. Defaults to comp = 1.

blocks

a vector indicating the block variables to display.

cutoff

numeric value between 0 and 1. The tuning threshold for the relevant associations network (see Details).

row.names, col.names

character vector containing the names of X- and Y-variables.

block.var.names

either a list of vector components for variable names in each block or FALSE for no names. If TRUE, the columns names of the blocks are used as names.

graph.scale

Numeric between 0 and 1 which alters the scale of the entire plot. Increasing the value decreases the size of nodes and increases their distance from one another. Defaults to 0.5.

size.node

Numeric between 0 and 1 which determines the relative size of nodes. Defaults to 0.5.

color.node

vector of length two, the colors of the X and Y nodes (see Details).

shape.node

character vector of length two, the shape of the X and Y nodes (see Details).

alpha.node

Numeric between 0 and 1 which determines the opacity of nodes. Only used in block objects.

cex.node.name

the font size for the node labels.

color.edge

vector of colors or character string specifying the colors function to using to color the edges, set to default to color.GreenRed(100) but other palettes can be chosen (see Details and Examples).

lty.edge

character vector of length two, the line type for the edges (see Details).

lwd.edge

vector of length two, the line width of the edges (see Details).

show.edge.labels

logical. If TRUE, plot association values as edge labels (defaults to FALSE).

cex.edge.label

the font size for the edge labels.

show.color.key

Logical. If TRUE a color key should be plotted.

symkey

Logical indicating whether the color key should be made symmetric about 0. Defaults to TRUE.

keysize

numeric value indicating the size of the color key.

keysize.label

vector of length 1, indicating the size of the labels and title of the color key.

breaks

(optional) either a numeric vector indicating the splitting points for binning mat into colors, or a integer number of break points to be used, in which case the break points will be spaced equally between min(mat) and max(mat).

interactive

logical. If TRUE, a scrollbar is created to change the cutoff value interactively (defaults to FALSE). See Details.

layout.fun

a function. It specifies how the vertices will be placed on the graph. See help(layout) in the igraph package. Defaults to layout.fruchterman.reingold.

save

should the plot be saved ? If so, argument to be set either to 'jpeg', 'tiff', 'png' or 'pdf'.

name.save

character string giving the name of the saved file.

plot.graph

logical. If TRUE (default), plotting window will be filled with network. If FALSE, then no graph will be plotted, though the return value of the function is the exact same.

Details

network allows to infer large-scale association networks between the X and Y datasets in rcc or spls. The output is a graph where each X- and Y-variable corresponds to a node and the edges included in the graph portray associations between them.

In rcc, to identify X-Y pairs showing relevant associations, network calculate a similarity measure between X and Y variables in a pair-wise manner: the scalar product value between every pairs of vectors in dimension length(comp) representing the variables X and Y on the axis defined by Z_i with i in comp, where Z_i is the equiangular vector between the i-th X and Y canonical variate.

In spls, if object$mode is regression, the similarity measure between X and Y variables is given by the scalar product value between every pairs of vectors in dimension length(comp) representing the variables X and Y on the axis defined by U_i with i in comp, where U_i is the i-th X variate. If object$mode is canonical then X and Y are represented on the axis defined by U_i and V_i respectively.

Variable pairs with a high similarity measure (in absolute value) are considered as relevant. By changing the cut-off, one can tune the relevance of the associations to include or exclude relationships in the network.

interactive=TRUE open two device, one for association network, one for scrollbar, and define an interactive process: by clicking either at each end (- or +) of the scrollbar or at middle portion of this. The position of the slider indicate which is the ‘cutoff’ value associated to the display network.

The network can be saved in a .glm format using the igraph package, the function write.graph and extracting the output obkect$gR.

The interactive process is terminated by clicking the second button and selecting Stop from the menu, or from the Stop menu on the graphics window.

The color.node is a vector of length two, of any of the three kind of R colors, i.e., either a color name (an element of colors()), a hexadecimal string of the form "#rrggbb", or an integer i meaning palette()[i]. color.node[1] and color.node[2] give the color for filled nodes of the X- and Y-variables respectively. Defaults to c("white", "white").

color.edge give the color to edges with colors corresponding to the values in mat. Defaults to color.GreenRed(100) for negative (green) and positive (red) correlations. We also propose other palettes of colors, such as color.jet and color.spectral, see help on those functions, and examples below. Other palette of colors from the stats package can be used too.

shape.node[1] and shape.node[2] provide the shape of the nodes associate to X- and Y-variables respectively. Current acceptable values are "circle" and "rectangle". Defaults to c("circle", "rectangle").

lty.edge[1] and lty.egde[2] give the line type to edges with positive and negative weight respectively. Can be one of "solid", "dashed", "dotted", "dotdash", "longdash" and "twodash". Defaults to c("solid", "solid").

lwd.edge[1] and lwd.edge[2] provide the line width to edges with positive and negative weight respectively. This attribute is of type double with a default of c(1, 1).

Value

network return a list containing the following components:

M

the correlation matrix used by network.

gR

a graph object to save the graph for cytoscape use (requires to load the igraph package).

Warning

If the number of variables is high, the generation of the network generation can take some time.

Author(s)

Ignacio González, Kim-Anh Lê Cao, AL J Abadi

References

Mathematical definition: González I., Lê Cao K-A., Davis, M.J. and Déjean, S. (2012). Visualising associations between paired omics data sets. J. Data Mining 5:19. http://www.biodatamining.org/content/5/1/19/abstract

Examples and illustrations:

Rohart F, Gautier B, Singh A, Lê Cao K-A. mixOmics: an R package for 'omics feature selection and multiple data integration. PLoS Comput Biol 13(11): e1005752

Relevance networks:

Butte, A. J., Tamayo, P., Slonim, D., Golub, T. R. and Kohane, I. S. (2000). Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proceedings of the National Academy of Sciences of the USA 97, 12182-12186.

Moriyama, M., Hoshida, Y., Otsuka, M., Nishimura, S., Kato, N., Goto, T., Taniguchi, H., Shiratori, Y., Seki, N. and Omata, M. (2003). Relevance Network between Chemosensitivity and Transcriptome in Human Hepatoma Cells. Molecular Cancer Therapeutics 2, 199-205.

See Also

plotVar, cim, color.GreenRed, color.jet, color.spectral and http: //www.mixOmics.org for more details.

Examples

## network representation for objects of class 'rcc'
data(nutrimouse)
X <- nutrimouse$lipid
Y <- nutrimouse$gene
nutri.res <- rcc(X, Y, ncomp = 3, lambda1 = 0.064, lambda2 = 0.008)

## Not run: 
# may not work on the Linux version, use Windows instead
# sometimes with Rstudio might not work because of margin issues,
# in that case save it as an image
jpeg('example1-network.jpeg', res = 600, width = 4000, height = 4000)
network(nutri.res, comp = 1:3, cutoff = 0.6)
dev.off()

## Changing the attributes of the network

# sometimes with Rstudio might not work because of margin issues,
# in that case save it as an image
jpeg('example2-network.jpeg')
network(nutri.res, comp = 1:3, cutoff = 0.45,
color.node = c("mistyrose", "lightcyan"),
shape.node = c("circle", "rectangle"),
color.edge = color.jet(100),
lty.edge = "solid", lwd.edge = 2,
show.edge.labels = FALSE)
dev.off()


## interactive 'cutoff' - select the 'cutoff' and "see" the new network
## only run this during an interactive session
if (interactive()) {
    network(nutri.res, comp = 1:3, cutoff = 0.55, interactive = TRUE)
}
dev.off()

## network representation for objects of class 'spls'
data(liver.toxicity)
X <- liver.toxicity$gene
Y <- liver.toxicity$clinic
toxicity.spls <- spls(X, Y, ncomp = 3, keepX = c(50, 50, 50),
keepY = c(10, 10, 10))

# sometimes with Rstudio might not work because of margin issues,
# in that case save it as an image
jpeg('example3-network.jpeg')
network(toxicity.spls, comp = 1:3, cutoff = 0.8,
color.node = c("mistyrose", "lightcyan"),
shape.node = c("rectangle", "circle"),
color.edge = color.spectral(100),
lty.edge = "solid", lwd.edge =  1,
show.edge.labels = FALSE, interactive = FALSE)
dev.off()

## End(Not run)

mixOmicsTeam/mixOmics documentation built on Dec. 3, 2024, 11:15 p.m.