Introduction

One of the ultimate goals in systems biology is the elucidation of the genotype-phenotype relationship in cellular systems [1]. Integrated ‘ omics’ approaches have received particular attention. Most omics analyses monitor the level of target molecules like transcripts and metabolites. In this context, enrichment-analysis approaches such as Gene Set Enrichment Analysis (GSEA) [2-3] can be combined with pathway analysis to evaluate whether a particular molecular group is significantly over-represented. In metabolomics, one of promising approaches is Metabolite Set Enrichment Analysis (MSEA) [4]. MSEAp is an R package for performing MSEA using both plant and animal metabolic pathways and metabolite sets. The package can import pre-defined pathways like SMPDB and KEGG and user-defined metabolite sets in gmt format. Their metabolite sets are integrated and used for MSEA calculation. Given a metabolite-set or a pathway annotation of metabolites, the significance of the over-representation of metabolites in the pathways is determined with the Fisher exact test. MSEAp package provides various visualization functions of MSEA results like barplot from MetaboAnalyst [5], dotplot, and netplot. The package also contains pre-defined and extracted PlantCyc data that are publicly available and our curated metabolite sets from MeKO [6], Arabidopsis flavonoid and glucosinolate (Tokimatsu, Nishida et al., in preparation), and the plant metabolomics literature (AtMetExpress, in preparation).

Installation

# If you are using Debian or Ubuntu, please uncomment the next two lines
#system("sudo apt-get update")
#system("sudo apt-get install -y zlib1g-dev libxml2-dev libpng-dev")

install.packages(c("devtools", "webshot", "knitr", "rmarkdown"))

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("BiocStyle", "paxtoolsr", "RCy3"))

devtools::install_github("afukushima/MSEAp", build_vignettes = TRUE)
devtools::install_github("afukushima/MSEApdata", build_vignettes = TRUE)

Getting started

Below is an example of MSEA. mset_SMPDB_format_KEGG is a metabolite set that contains the metabolites for each pathway from the SMPDB [7] database with the metabolite ID from the KEGG [8] database.

kusano is a metabolite list with KEGG metabolite IDs for MESA. This list corresponds with significantly accumulated metabolites between UV-B (one-day treatment)- and control conditions [9]. Typically, a metabolite list is prepared by testing two hypotheses, e.g., the t-test and modified t-statistic (e.g., limma). msea performs MSEA and identifies biologically meaningful accumulations in metabolomic data.

res is the MSEA result and the plot function visualizes the top 20 over-represented pathways.

library(MSEApdata)
library(MSEAp)
data(mset_SMPDB_format_KEGG)
data(kusano)
res <- msea(mset_SMPDB_format_KEGG, kusano)
plot(res)
##
head(kusano)
mset_SMPDB_format_KEGG[[1]]

Appetizer

Importing user-defined metabolite sets

MSEAp can import user-defined metabolite-sets in gmt format using read.gmt() function:

file <- system.file("extdata", "sample_SMPDB.gmt", package = "MSEAp")
smp <- read.gmt(file)
length(smp)
smp[[1]]
smp[[5]]

Importing Wikipathways GPML file

The function read.gpml() can extract compound sets from Wikipathways GPML file to the metabolite-sets:

file <- system.file("extdata", 
                    "At_AtMetExpress_overview_WP3622_89229.gpml", 
                    package = "MSEAp")
wikip <- read.gpml(file)
print(wikip)

Metabolite sets

You can see the currently available datasets by using MSEApdata::supported.msets():

MSEApdata::supported.msets()

Below is a table of the number of our metabolite-sets.

| Data source | KEGG | HMDB | KNApSAcK | | :--- | ---: | ---: | ---: | ------------- | | SMPDB | 717 | 717 | NA | | AraCyc | 268 | 350 | NA | | PlantCyc | 606 | 714 | NA | | EcoCyc | 222 | 254 | NA | | MetaCyc | 1861 | 2059 | NA | | HumanCyc | 89 | NA | NA | | MouseCyc | 71 | NA | NA | | FlyCyc | 90 | NA | NA | | Reactome (Arabidopsis thaliana) | 211 | 137 | NA | | Reactome (Bos taurus) | 426 | 310 | NA | | Reactome (Caenorhabditis elegans) | 276 | 205 | NA | | Reactome (Canis familiaris) | 415 | 293 | NA | | Reactome (Dictyostelium discoideum) | 209 | 135 | NA | | Reactome (Drosophila melanogaster) | 318 | 224 | NA | | Reactome (Danio rerio) | 393 | 287 | NA | | Reactome (Gallus gallus) | 405 | 295 | NA | | Reactome (Homo sapiens) | 542 | 388 | NA | | Reactome (Mus musculus) | 432 | 308 | NA | | Reactome (Mycobacterium tuberculosis) | 9 | 7 | NA | | Reactome (Oryza sativa) | 203 | 137 | NA | | Reactome (Plasmodium falciparum) | 86 | 54 | NA | | Reactome (Rattus norvegicus) | 424 | 308 | NA | | Reactome (Saccharomyces cerevisiae) | 196 | 130 | NA | | Reactome (Schizosaccharomyces pombe) | 149 | 94| NA | | Reactome (Sus scrofa) | 424 | 301 | NA | | Reactome (Taeniopygia guttata) | 364 | 266 | NA | | Reactome (Xenopus tropicalis) | 406 | 297 | NA | | AtMetExpress (Stress)| 9 | NA | NA | | AtMetExpress (2nd Metabolism) | NA | NA | 4 |

For plant scientists

You can perform MSEA using AraCyc pathways and our curated metabolite sets (flavonoid pathways and some literatures). MSEAp is designed to easily extend metabolite sets, and its extension can be realized only by adding a metabolite set to the R list. The process is:

data(mset_AraCyc_format_KEGG)
data(mset_AtMetExpress_Stress_format_KEGG)
data(mset_AtMetExpress_Flavonoids_format_KNApSAcK)
mset <- c(
    mset_AraCyc_format_KEGG,
    mset_AtMetExpress_Stress_format_KEGG,
    mset_AtMetExpress_Flavonoids_format_KNApSAcK
)
length(mset) ## 281 metabolite-sets

Here, fukushima17_INC is a metabolite list that contains 37 significantly increased metabolites including primary and secondary metabolites, mainly flavonols and anthocyanins that are obtained under mild oxidative stress- and control conditions [10]. Both KEGG [8] and KNApSAcK [11] IDs can be used for MSEA.

data(fukushima17_INC)
head(fukushima17_INC)
res <- msea(mset, fukushima17_INC$kegg_knap)
head(res)
plot(res)

Visualization of MSEA results

To help interpreting the MSEA results we also implemented barplot, dotplot, and netplot for visualization.

barplot plots the p-value as a color and the number of metabolite queries contained in the metabolite group as the height of the bar.

dotplot plots dots on the expected value of MSEA in each group, with the p-value as a color (like barplot) and the number of metabolite queries contained in the metabolite group as the size of the dot.

netplot receives the MSEA result, the metabolite set and the shared.metabolite argument and plots the p-value in the MSEA result as the node color on the network of the metabolite group. The size of the node indicates the number of metabolite queries contained in the metabolite group. The shared.metabolite argument decides how to connect the metabolite group with the edge. In the example below, if there are more than 20 metabolites common among groups, we connect these groups with the edge.

By default, netplot performs visualization using the visNetwork package, but you can add the sendto argument and make the output destination Cytoscape [12]. In the default plot, it is possible to use interactive functions by, for example, mousing over the popup of the MSEA information.

res <- msea(mset_SMPDB_format_KEGG, kusano)
## You may have to zoom this plot in RStudio and other R working environments
dotplot(res)  
barplot(res)  ## You can see the same plot by plot()

netplot(res, mset_SMPDB_format_KEGG, shared.metabolite = 20)

Acknowledgments

We thank Dr. Toshiaki Tokimatsu in NIG for manual curation of the flavonoid biosynthetic pathway in Arabidopsis thaliana and Ms. Ursula Petralia for editorial assistance.

References

  1. Fukushima A, Kanaya S, Nishida K: Integrated network analysis and effective tools in plant systems biology. Front Plant Sci. 2014 Nov 4;5:598. doi: 10.3389/fpls.2014.00598.
  2. Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, Puigserver P, Carlsson E, Ridderstråle M, Laurila E, Houstis N, Daly MJ, Patterson N, Mesirov JP, Golub TR, Tamayo P, Spiegelman B, Lander ES, Hirschhorn JN, Altshuler D, Groop LC: PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003 Jul;34(3):267-273.
  3. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, Mesirov JP: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. PNAS 2005 Oct 25;102(43):15545-15550].
  4. Xia J and Wishart DS: MSEA: a web-based tool to identify biologically meaningful patterns in quantitative metabolomic data. Nucleic Acids Res. 2010 Jul;38(Web Server issue):W71-77. doi:10.1093/nar/gkq329.
  5. Xia J, Sinelnikov IV, Han B, Wishart DS: MetaboAnalyst 3.0--making metabolomics more meaningful. Nucleic Acids Res. 2015 Jul 1;43(W1):W251-257. doi: 10.1093/nar/gkv380.
  6. Fukushima A, Kusano M, Mejia RF, Iwasa M, Kobayashi M, Hayashi N, Watanabe-Takahashi A, Narisawa T, Tohge T, Hur M, Wurtele ES, Nikolau BJ, Saito K: Metabolomic Characterization of Knockout Mutants in Arabidopsis: Development of a Metabolite Profiling Database for Knockout Mutants in Arabidopsis. Plant Physiol. 2014 Jul;165(3):948-961.
  7. Jewison T, Su Y, Disfany FM, Liang Y, Knox C, Maciejewski A, Poelzer J, Huynh J, Zhou Y, Arndt D, Djoumbou Y: SMPDB 2.0: big improvements to the Small Molecule Pathway Database. Nucleic acids research. 2013 Nov 6;42(D1):D478-84.
  8. Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K: KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic acids research. 2017 Jan 4;45(D1):D353-61.
  9. Kusano M, Tohge T, Fukushima A, Kobayashi M, Hayashi N, Otsuki H, Kondou Y, Goto H, Kawashima M, Matsuda F, Niida R, Matsui M, Saito K, Fernie AR: Metabolomics reveals comprehensive reprogramming involving two independent metabolic responses of Arabidopsis to UV-B light. Plant J. 2011 67(2):354-69.
  10. Fukushima A, Iwasa M, Nakabayashi R, Kobayashi M, Nishizawa T, Okazaki Y, Saito K, Kusano M: Effects of Combined Low Glutathione with Mild Oxidative and Low Phosphorus Stress on the Metabolism of Arabidopsis thaliana. Front Plant Sci. 2017 Aug 28;8:1464.
  11. Afendi FM, Okada T, Yamazaki M, Hirai-Morita A, Nakamura Y, Nakamura K, Ikeda S, Takahashi H, Altaf-Ul-Amin M, Darusman LK, Saito K, Kanaya S: KNApSAcK family databases: integrated metabolite-plant species databases for multifaceted plant research. Plant Cell Physiol. 2012 Feb;53(2):e1. doi: 10.1093/pcp/pcr165.
  12. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research. 2003 Nov;13(11):2498-504.

Session information {.unnumbered}

Here is the output of sessionInfo() on the system on which this document was compiled:

sessionInfo()


afukushima/MSEAp documentation built on Sept. 18, 2019, 7:12 p.m.