In andreasmock/MetaboDiff: An R package for differential metabolomic analysis

library(MetaboDiff)

Part I: Implementation of WGNCA for metabolomics

The following workflow was adapted from the weighted gene co-expression analysis (WGCNA) proposed by Langfelder and Horvarth[^1] and makes use of the functions of the corresponding WGCNA R package[^2].

As for the MetaboDiff tutorial, the implementation is demonstrated using the example data from a study by Priolo and colleagues in which the authors used the service of Metabolon® to compare the tissue metabolome of 40 prostate cancers with 16 normal prostate specimens[^3]. The MetaboDiff packages contains this data within the object met_example.

In the MetaboDiff tutorial, all steps for WGCNA were performed within a set of functions connected by pipes:

diss_matrix - construction of dissimilarity matrix
identify_modules - identification of metabolic correlation modules
name_modules - name metabolic correlation modules
calculate_MS - calculation of module signficance to relate sample traits to modules

met_example <- met_example %>%
    diss_matrix %>%
    identify_modules(min_module_size=5) %>%
    name_modules(pathway_annotation="SUB_PATHWAY") %>%
    calculate_MS(group_factors=c("tumor_normal","random_gender"))

The individual steps will be explained as follows.

Construction of dissimilarity matrix

The first step in constructing a metabolic correlation network is the creation of a dissimilarity matrix. Biweight midcorrelation was used as a similiarity measure as it is more robust to outliers than the absolute correlation coefficient[^4]. This choice is important, as we do not expect metabolites to be correlated in all patients.

The core concept of the so called "weighted" correlation analysis by Langfelder and Horvarth is that instead of defining a "hard" threshold (e.g. an absolute correlation coefficient > 0.8) to decide whether a node to connected to another, the adjacency a is defined by raising the similarity s to a power beta ("soft" threshold):

\begin{equation} a_{ij} = s_{ij}^\beta \end{equation}

Lastly, the dissimilarity measure w is defined by

\begin{equation} w_{ij} = 1 - a_{ij}
\end{equation}

For detailed rationale of this approach, please see Zhang and Horvath[^1]. For metabolic networks, we identified that a beta value of 3 was the lowest power for which the scale-free topology of the topology was met_example.

The function diss_matrix creates the dissimilarity measure for the met_example objects and saves it in the metadata slot

met_example <- diss_matrix(met_example)

Identification of metabolic correlation modules

To identify metabolic correlation modules, metabolites are next clustered based on the dissimilarity measure where branches of the dendrogram correspond to modules. Ultimately, modules are detected by applying a branch cutting method with a minimal module size of 5 metabolites. We employed the dynamic branch cut method developed by Langfelder and colleagues[^5], as constant height cutoffs exhibit suboptimal performance on complicated dendrograms. Figure \@ref(fig:WGCNA) shows the hierarchical clustering and corresponding modules after branch cutting.

met_example <- identify_modules(met_example, 
                       min_module_size=5)

WGCNA::plotDendroAndColors(metadata(met_example)$tree, 
                    metadata(met_example)$module_color_vector, 
                    'Module colors', 
                    dendroLabels = FALSE, 
                    hang = 0.03,
                    addGuide = TRUE, 
                    guideHang = 0.05, main='')

The relation between the identified metabolic correlation modules can be visualized by a dendrogram of their eigenmetabolite (figure \@ref(fig:eigengenes)). The module eigenmetabolite is defined as the first principal component of the metabolic measurements in the respective module.

par(mar=c(2,2,2,2))
ape::plot.phylo(ape::as.phylo(metadata(met_example)$METree),
           type = 'fan',
           show.tip.label = FALSE, 
           main='')
ape::tiplabels(frame = 'circle',
          col='black', 
          text=rep('',length(unique(metadata(met_example)$modules))), 
          bg = WGCNA::labels2colors(0:21))

# number of metabolites per module
table(metadata(met_example)$modules)

Name metabolic correlation modules

To enable a better interpretation of metabolic correlation modules, modules are named according to the most abundant pathway annotation in a module (figure \@ref(fig:naming)).

# calculate module significance
met_example <- name_modules(met_example,
                   pathway_annotation = "SUB_PATHWAY")

# plot phylogram with names
ape::plot.phylo(ape::as.phylo(metadata(met_example)$METree), cex=0.9)

Calculation of module signficance to relate sample traits to modules

An advantage of correlation network analysis is the possibility to integrate external information. At the lowest hierarchical level, metabolite significance (MetS) measures can be defined as the statistical significance (i.e. p-value, $p_i$) between the $i$-th node profile (metabolite) $x_i$ and the sample trait $T$

\begin{equation} MetS_i = -log~p_i \end{equation}

Module significance (MS) in turn can be determined as the average absolute metabolite significance measure. This conceptual framework can be adapted to any research question.

met_example <- calculate_MS(met_example,
                   group_factors = c("tumor_normal","random_gender"))

Figure \@ref(fig:MSplot) shows that metabolic correlation module 2 (Creatine metabolism / Glutathione metabolism) was significantly associated with the tumor vs. normal comparison in the example data.

MS_plot(met_example,
        group_factor="tumor_normal",
        p_value_cutoff=0.05,
        p_adjust=FALSE)

In line with the volcano plots, the random grouping did not result in a significant association (figure \@ref(fig:MSplot2))

MS_plot(met_example,
        group_factor="random_gender",
        p_value_cutoff=0.05,
        p_adjust=FALSE)

Exploration of individual metabolites within correlation module

Assessing the module significance for different sample traits facilitates an understanding of individual metabolic correlation modules. As for metabolomics, we are next interested in the role of the individual metabolite within module. To this end, Langfelder and Horvath suggest a 'fuzzy' measure of module membership defined as

\begin{equation} K^q = |cor(x_i,E^q)| \end{equation}

where $x_i$ is the profile of metabolite $i$ and $E^q$ is the eigenmetabolite of module $q$. Based on this definition, $K$ describes how closely related metabolite $i$ is to module $q$. A meaningful visualization is consequently plotting the module membership over the p-value of the respective $MetS$ measure (MOI_plot). As a third dimension, the color is scaled according to the effect size (i.e. difference in means).

MOI_plot(met_example,
         group_factor="tumor_normal",
         MOI = 2,
         label_colors=c("darkseagreen","dodgerblue"),
         p_adjust = FALSE) + xlim(c(-1,8))

For more information about the application and interpretation of metabolic correlation networks, please refer to the package vignette 'Case_study'.

Session information {.unnumbered}

sessionInfo()

[^1]: Zhang, B., & Horvath, S. (2005). A general framework for weighted gene co-expression network analysis. Statistical Applications in Genetics and Molecular Biology, 4(1), Article17. http://doi.org/10.2202/1544-6115.1128

[^2]: Langfelder, P., & Horvath, S. (2008). WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics, 9, 559–559. http://doi.org/10.1186/1471-2105-9-559

[^3]: Priolo, C., Pyne, S., Rose, J., Regan, E. R., Zadra, G., Photopoulos, C., et al. (2014). AKT1 and MYC Induce Distinctive Metabolic Fingerprints in Human Prostate Cancer. Cancer Research, 74(24), 7198–7204. http://doi.org/10.1158/0008-5472.CAN-14-1490

[^4]: Zheng, C.-H., Yuan, L., Sha, W., & Sun, Z.-L. (2014). Gene differential coexpression analysis based on biweight correlation and maximum clique. BMC Bioinformatics, 15 Suppl 15(Suppl 15), S3. http://doi.org/10.1186/1471-2105-15-S15-S3

[^5]: Langfelder, P., Zhang, B., & Horvath, S. (2008). Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. Bioinformatics, 24(5), 719–720. http://doi.org/10.1093/bioinformatics/btm563.

[^6]: Horvath, S., & Dong, J. (2008). Geometric Interpretation of Gene Coexpression Network Analysis. PLoS Computational Biology (PLOSCB) 4(8), 4(8), e1000117–e1000117. http://doi.org/10.1371/journal.pcbi.1000117

andreasmock/MetaboDiff documentation built on Oct. 31, 2020, 8:18 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com