findMST2: Union of the First and Second Minimum Spanning Trees

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/findMST2.R

Description

Find the union of the first and second minimum spanning trees.

Usage

1
findMST2(object, cor.method="pearson", min.sd=1e-3, return.MST2only=TRUE)

Arguments

object

a numeric matrix with columns and rows respectively corresponding to samples and features.

cor.method

a character string indicating which correlation coefficient is to be computed. Possible values are “pearson” (default), “spearman” and “kendall”.

min.sd

the minimum allowed standard deviation for any feature. If any feature has a standard deviation smaller than min.sd the execution stops and an error message is returned.

return.MST2only

logical. If TRUE (default), an object of class igraph containing the MST2 is returned. If FALSE, a list of length three containing objects of class igraph is returned. The first and second objects are the first and second MSTs, respectively. The third is the union of the first and second, MST2.

Details

This function produces the union of the first and second minimum spanning trees (MSTs) as an object of class igraph (check package igraph for details). It can as well return the first and second minimum spanning trees when return.MST2only is FALSE (default). It starts by calculating the correlation (coexpression) matrix and using it to obtain a weighting matrix for a complete graph using the equation w_{ij} = 1 - |r_{ij}| where r_{ij} is the correlation between features i and j and w_{ij} is the weight of the link between vertices (nodes) i and j in the graph G(V,E).

For the graph G(V,E) where V is the set of vertices and E is the set of edges, the first MST is defined as the acyclic subset T_{1} \subseteq E that connects all vertices in V and whose total length ∑_{i,j \in T_{1}} d(v_{i},v_{j}) is minimal (Rahmatallah et. al. 2014). The second MST is defined as the MST of the reduced graph G(V,E-T_{1}). The union of the first and second MSTs is denoted as MST2.

It was shown in Rahmatallah et. al. 2014 that MST2 can be used as a graphical visualization tool to highlight the most highly correlated genes in the correlation network. A gene that is highly correlated with all the other genes tends to occupy a central position and has a relatively high degree in the MST2 because the shortest paths connecting the vertices of the first and second MSTs tend to pass through the vertex corresponding to this gene. In contrast, a gene with low intergene correlations most likely occupies a non-central position in the MST2 and has a degree of 2.

In rare cases, a feature may have a constant or nearly constant level across the samples. This results in a zero or a tiny standard deviation. Such case produces an error in command cor used to compute the correlations between features. To avoid this situation, standard deviations are checked in advance and if any is found below the minimum limit min.sd (default is 1e-3), the execution stops and an error message is returned indicating the the number of feature causing the problem (if only one the index of that feature is given too).

Value

When return.MST2only=TRUE (default), function findMST2 returns an object of class igraph representing the MST2. If return.MST2only=FALSE, function findMST2 returns a list of length 3 with the following components:

MST2

an object of class igraph containing the union of the first and second MSTs.

first.mst

an object of class igraph containing the first MST.

second.mst

an object of class igraph containing the second MST.

Author(s)

Yasir Rahmatallah and Galina Glazko

References

Rahmatallah Y., Emmert-Streib F. and Glazko G. (2014) Gene sets net correlations analysis (GSNCA): a multivariate differential coexpression test for gene sets. Bioinformatics 30, 360–368.

See Also

GSNCAtest, plotMST2.pathway.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## generate a dataset of 20 features and 20 samples
## use multivariate normal distribution with different covariance matrices
library(MASS)
ngenes <- 20
nsamples <- 20
zero_vector <- array(0,c(1,ngenes))
## create a covariance matrix with high off-diagonal elements
## for the first 5 features and low for the remaining 15 features
cov_mtrx <- diag(ngenes)
cov_mtrx[!diag(ngenes)] <- 0.1
mask <- diag(ngenes/4)
mask[!diag(ngenes/4)] <- 0.6
cov_mtrx[1:(ngenes/4),1:(ngenes/4)] <- mask
gp <- mvrnorm(nsamples, zero_vector, cov_mtrx)
dataset <- aperm(gp, c(2,1))
## findMST2 returns a list of length 3
## trees[[1]] is an object of class igraph containing the MST2
trees <- findMST2(dataset)

GSAR documentation built on Nov. 8, 2020, 7:16 p.m.