library(MetaboDiff)
The following workflow was adapted from the weighted gene co-expression analysis (WGCNA) proposed by Langfelder and Horvarth[^1] and makes use of the functions of the corresponding WGCNA R package[^2].
As for the MetaboDiff tutorial, the implementation is demonstrated using the example data from a study by Priolo and colleagues in which the authors used the service of Metabolon® to compare the tissue metabolome of 40 prostate cancers with 16 normal prostate specimens[^3]. The MetaboDiff packages contains this data within the object met_example
.
In the MetaboDiff tutorial, all steps for WGCNA were performed within a set of functions connected by pipes:
diss_matrix
- construction of dissimilarity matrixidentify_modules
- identification of metabolic correlation modulesname_modules
- name metabolic correlation modulescalculate_MS
- calculation of module signficance to relate sample traits to modulesmet_example <- met_example %>% diss_matrix %>% identify_modules(min_module_size=5) %>% name_modules(pathway_annotation="SUB_PATHWAY") %>% calculate_MS(group_factors=c("tumor_normal","random_gender"))
The individual steps will be explained as follows.
The first step in constructing a metabolic correlation network is the creation of a dissimilarity matrix. Biweight midcorrelation was used as a similiarity measure as it is more robust to outliers than the absolute correlation coefficient[^4]. This choice is important, as we do not expect metabolites to be correlated in all patients.
The core concept of the so called "weighted" correlation analysis by Langfelder and Horvarth is that instead of defining a "hard" threshold (e.g. an absolute correlation coefficient > 0.8) to decide whether a node to connected to another, the adjacency a
is defined by raising the similarity s
to a power beta
("soft" threshold):
\begin{equation} a_{ij} = s_{ij}^\beta \end{equation}
Lastly, the dissimilarity measure w
is defined by
\begin{equation}
w_{ij} = 1 - a_{ij}
\end{equation}
For detailed rationale of this approach, please see Zhang and Horvath[^1]. For metabolic networks, we identified that a beta value of 3 was the lowest power for which the scale-free topology of the topology was met_example.
The function diss_matrix
creates the dissimilarity measure for the met_example
objects and saves it in the metadata slot
met_example <- diss_matrix(met_example)
To identify metabolic correlation modules, metabolites are next clustered based on the dissimilarity measure where branches of the dendrogram correspond to modules. Ultimately, modules are detected by applying a branch cutting method with a minimal module size of 5 metabolites. We employed the dynamic branch cut method developed by Langfelder and colleagues[^5], as constant height cutoffs exhibit suboptimal performance on complicated dendrograms. Figure \@ref(fig:WGCNA) shows the hierarchical clustering and corresponding modules after branch cutting.
met_example <- identify_modules(met_example, min_module_size=5)
WGCNA::plotDendroAndColors(metadata(met_example)$tree, metadata(met_example)$module_color_vector, 'Module colors', dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05, main='')
The relation between the identified metabolic correlation modules can be visualized by a dendrogram of their eigenmetabolite (figure \@ref(fig:eigengenes)). The module eigenmetabolite is defined as the first principal component of the metabolic measurements in the respective module.
par(mar=c(2,2,2,2)) ape::plot.phylo(ape::as.phylo(metadata(met_example)$METree), type = 'fan', show.tip.label = FALSE, main='') ape::tiplabels(frame = 'circle', col='black', text=rep('',length(unique(metadata(met_example)$modules))), bg = WGCNA::labels2colors(0:21))
# number of metabolites per module table(metadata(met_example)$modules)
To enable a better interpretation of metabolic correlation modules, modules are named according to the most abundant pathway annotation in a module (figure \@ref(fig:naming)).
# calculate module significance met_example <- name_modules(met_example, pathway_annotation = "SUB_PATHWAY") # plot phylogram with names ape::plot.phylo(ape::as.phylo(metadata(met_example)$METree), cex=0.9)
An advantage of correlation network analysis is the possibility to integrate external information. At the lowest hierarchical level, metabolite significance (MetS) measures can be defined as the statistical significance (i.e. p-value, $p_i$) between the $i$-th node profile (metabolite) $x_i$ and the sample trait $T$
\begin{equation} MetS_i = -log~p_i \end{equation}
Module significance (MS) in turn can be determined as the average absolute metabolite significance measure. This conceptual framework can be adapted to any research question.
met_example <- calculate_MS(met_example, group_factors = c("tumor_normal","random_gender"))
Figure \@ref(fig:MSplot) shows that metabolic correlation module 2 (Creatine metabolism / Glutathione metabolism) was significantly associated with the tumor vs. normal comparison in the example data.
MS_plot(met_example, group_factor="tumor_normal", p_value_cutoff=0.05, p_adjust=FALSE)
In line with the volcano plots, the random grouping did not result in a significant association (figure \@ref(fig:MSplot2))
MS_plot(met_example, group_factor="random_gender", p_value_cutoff=0.05, p_adjust=FALSE)
Assessing the module significance for different sample traits facilitates an understanding of individual metabolic correlation modules. As for metabolomics, we are next interested in the role of the individual metabolite within module. To this end, Langfelder and Horvath suggest a 'fuzzy' measure of module membership defined as
\begin{equation} K^q = |cor(x_i,E^q)| \end{equation}
where $x_i$ is the profile of metabolite $i$ and $E^q$ is the eigenmetabolite of module $q$. Based on this definition, $K$ describes how closely related metabolite $i$ is to module $q$. A meaningful visualization is consequently plotting the module membership over the p-value of the respective $MetS$ measure (MOI_plot
). As a third dimension, the color is scaled according to the effect size (i.e. difference in means).
MOI_plot(met_example, group_factor="tumor_normal", MOI = 2, label_colors=c("darkseagreen","dodgerblue"), p_adjust = FALSE) + xlim(c(-1,8))
For more information about the application and interpretation of metabolic correlation networks, please refer to the package vignette 'Case_study'.
sessionInfo()
[^1]: Zhang, B., & Horvath, S. (2005). A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology, 4(1), Article17. http://doi.org/10.2202/1544-6115.1128
[^2]: Langfelder, P., & Horvath, S. (2008). WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics, 9, 559–559. http://doi.org/10.1186/1471-2105-9-559
[^3]: Priolo, C., Pyne, S., Rose, J., Regan, E. R., Zadra, G., Photopoulos, C., et al. (2014). AKT1 and MYC Induce Distinctive Metabolic Fingerprints in Human Prostate Cancer. Cancer Research, 74(24), 7198–7204. http://doi.org/10.1158/0008-5472.CAN-14-1490
[^4]: Zheng, C.-H., Yuan, L., Sha, W., & Sun, Z.-L. (2014). Gene differential coexpression analysis based on biweight correlation and maximum clique. BMC Bioinformatics, 15 Suppl 15(Suppl 15), S3. http://doi.org/10.1186/1471-2105-15-S15-S3
[^5]: Langfelder, P., Zhang, B., & Horvath, S. (2008). Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics, 24(5), 719–720. http://doi.org/10.1093/bioinformatics/btm563.
[^6]: Horvath, S., & Dong, J. (2008). Geometric Interpretation of Gene Coexpression Network Analysis. PLoS Computational Biology (PLOSCB) 4(8), 4(8), e1000117–e1000117. http://doi.org/10.1371/journal.pcbi.1000117
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.