Description Usage Arguments Details Note Author(s) References See Also Examples
View source: R/plotMST2.pathway.R
This is a wrapper function which uses function
findMST2
to find the union of the first and second minimum
spanning trees (or MST2) of the correlation network for a feature set (pathway)
under two conditions. It plots the MST2 of the correlation network of the
feature set under both conditions side-by-side and highlights hub nodes to
facilitate a visual comparison.
1 2 3 4 5 | plotMST2.pathway(object, group, name=NULL, cor.method="pearson",
min.sd=1e-3, legend.size=1, leg.x=-0.8, leg.y=1.5, return.weights=FALSE,
group1.name="Group 1", group2.name="Group 2", label.size=1,
label.color="black", label.dist=0.5, vertex.size=8, vertex.label.font=1,
edge.width=1)
|
object |
a numeric matrix with columns and rows respectively corresponding to samples and features. Gene names are provided to this function as the rownames of this matrix. |
group |
a numeric vector indicating group associations for samples. Possible values are 1 and 2. |
name |
an optional character string giving the name of the feature set (gene set). If given, the name will be displayed at the top of the plot. |
cor.method |
a character string indicating which correlation
coefficient is to be computed. Possible values are “ |
min.sd |
a numeric value indicating the minimum allowed standard
deviation for any feature. If any feature has a standard deviation
smaller than |
legend.size |
an optional numeric value controlling the relative font size of the legend to the default font size. Default is 1. |
leg.x |
a numeric value indicating the amount of horizontal shift of the legend box to allow better positioning in the plot. |
leg.y |
a numeric value indicating the amount of vertical shift of the legend box to allow better positioning in the plot. |
return.weights |
logical. Default value is FALSE. If the weight
factors aasigned to the genes by the GSNCA method are desired, setting
this parameter to TRUE returns the weight factors in a matrix with 2 columns
(for class 1 and class 2) and number of rows equal to the number of genes
in the gene set. If the |
group1.name |
an optional character string to be presented as the
given name for class 1 in the plot. Default value is “ |
group2.name |
an optional character string to be presented as the
given name for class 2 in the plot. Default value is “ |
label.size |
a numeric value passed to argument vertex.label.cex in command plot.igraph to specify the vertex label size. Default value is 1. |
label.color |
a character string specifying the color of vertex
labels. Default value is “ |
label.dist |
a numeric value passed to argument vertex.label.dist in command plot.igraph to specify the distance between vertex labels and the centers of vertices. Default value is 0.5. |
vertex.size |
a numeric value passed to argument vertex.size in command plot.igraph to specify the vertex size. Default value is 8. |
vertex.label.font |
a numeric value passed to argument vertex.label.font in command plot.igraph to specify the used font type. Default value is 1. |
edge.width |
a numeric value passed to argument edge.width in command plot.igraph to specify the edge width in the plot. |
This is a wrapper plotting function for the convenience of users. It
uses function findMST2
to find the union of the first and second
minimum spanning trees (or MST2) of the correlation network for a feature set
(pathway) under two conditions and plots them side-by-side. It also lists the
hub nodes and their weight factors (w) under each condition (see
Rahmatallah et. al. 2014 for details). The range in which weight factors fall
is indicated by the node colors defined in the legend. Weight factor have
values mostly ranging between 0.5 (low coexpression) and 1.5 (high
coexpression). To allow the users more control over plotting parameters and
to present different feature sets appropriately, two optional arguments were
introduced: legend.size
and label.size
. Node lables will be the
names of the features in the set, i.e. rownames(object)
. If the
rownames
attribute is not set for object
, node labels will be
set to as.character(c(1:nrow(object)))
.
The weight factors, inferred from the Gene Sets Net Correlations Analysis
(GSNCA) method (see GSNCAtest
), correlate to some extent with
genes centralities in the MST2: genes with large weights are placed near the
center of the MST2, and genes with small weights are placed on the periphery
(Rahmatallah et. al. 2014). Adopting network terminology, a gene with the
largest weight is a hub gene, coexpressed with most of the other genes in a
pathway (see findMST2
). Therefore, MST2 is a convenient
graphical visualization tool to examine the pathways tested by the GSNCA
method (see GSNCAtest
).
The correlation (coexpression) network is obtained using the weight matrix
W with elements w_{ij} = 1 - |r_{ij}| where r_{ij} is the
correlation between features i and j and w_{ij} is the weight
of the link between vertices (nodes) i and j in the network. The
correlation coefficient used is indicated by the argument cor.method
with three possible values: “pearson
” (default),
“spearman
” and “kendall
”.
In some cases (especially for RNA-Seq count data), a feature (or more) may
have a constant or nearly constant level across the samples in one or both
conditions. This results in a zero or a tiny standard deviation. Such case
produces an error in command cor
used to compute the correlation
coefficients between features. To avoid this situation, standard deviations
are checked in advance and if any is found below the minimum limit
min.sd
(default is 1e-3
), the execution stops and an error
message is returned indicating the number of feature causing the problem
(if only one the index of that feature is given too).
This function is suitable for a feature set of roughly 80 features or less. It works for feature sets with larger number of features but the placements of nodes and their labels in the plot will be too crowded for a useful visual presentation.
Yasir Rahmatallah and Galina Glazko
Rahmatallah Y., Emmert-Streib F. and Glazko G. (2014) Gene sets net correlations analysis (GSNCA): a multivariate differential coexpression test for gene sets. Bioinformatics 30, 360–368.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | ## generate a feature set of length 20 in two conditions
## each condition has 20 samples
## use multivariate normal distribution with different covariance matrices
library(MASS)
ngenes <- 20
nsamples <- 40
zero_vector <- array(0,c(1,ngenes))
## create a covariance matrix with low off-diagonal elements
cov_mtrx1 <- diag(ngenes)
cov_mtrx1[!diag(ngenes)] <- 0.1
## create a covariance matrix with high off-diagonal elements
## for the first 5 features and low for the rest 15 features
cov_mtrx2 <- diag(ngenes)
cov_mtrx2[!diag(ngenes)] <- 0.1
mask <- diag(ngenes/4)
mask[!diag(ngenes/4)] <- 0.6
cov_mtrx2[1:(ngenes/4),1:(ngenes/4)] <- mask
gp1 <- mvrnorm((nsamples/2), zero_vector, cov_mtrx1)
gp2 <- mvrnorm((nsamples/2), zero_vector, cov_mtrx2)
gp <- rbind(gp1,gp2)
dataset <- aperm(gp, c(2,1))
## first 20 samples belong to group 1
## second 20 samples belong to group 2
## since rowname(object)=NULL, node labels will be automatically
## set to as.character(c(1:nrow(object)))
plotMST2.pathway(object=dataset, group=c(rep(1,20),rep(2,20)),
name="Example Pathway")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.