Previous versions of MetaboAnalyst encompassed two modules for functional analysis, metabolic pathway analysis (MetPA) (Xia et al. 2010, https://academic.oup.com/bioinformatics/article/26/18/2342/208464) and metabolite set enrichment analysis (MSEA) (Xia et al. 2010, https://academic.oup.com/nar/article/38/suppl_2/W71/1101310). However, these modules require metabolite identifications prior to use, which remains an important challenge in untargeted metabolomics. In comparison, the mummichog algorithm (Li et al. 2013, http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003123) bypasses the bottleneck of metabolite identification prior to pathway analysis, leveraging a priori pathway and network knowledge to directly infer biological activity based on mass peaks. We have therefore implemented the mummichog algorithm in R in a new module named “MS Peaks to Pathways”. The knowledge-base for this module consists of five genome-scale metabolic models from the original Python implementation which have either been manually curated or downloaded from BioCyc, as well as an expanded library of 21 organisms derived from KEGG metabolic pathways.
To use this module, users must upload a table (.txt) using the Read.PeakListData function containing either:
1) Three columns - m/z features, p-values, and t-scores or fold-change values. 2) Two columns containing m/z features and either p-values or t-scores 3) One column ranked by either p-values or t-scores. 4) Four columns - m/z features, p-values, t-scores or fold-change values, and the mode (positive or negative).
All inputted files must be in .txt format. If the input is a three column table, both the mummichog and GSEA algorithms (and their combination) can be applied. If only p-values (or ranked by p-values) are provided, then only the mummichog algorithm will be applied. If only t-scores (or ranked by t-scores) are provided, then only the GSEA algorithm will be applied.
If p-values have not yet been calculated, users can use the "Statistical Analysis" module to upload their raw peak tables, process the data, perform t-tests or fold-change analysis, and then upload these results into the module. With the table, users also need to specify the type of MS instrument, the ion mode (positive or negative) using the UpdateInstrumentParameters function. Currently, MetaboAnalystR only supports the handling of peaks obtained from high-resolution MS instruments such as Orbitrap, or Fourier Transform (FT)-MS instruments as recommended by the original mummichog implementation. Following data upload, users much select the organism’s library from which to perform untargeted pathway analysis using the SetMass.PathLib function. Users can then perform the mummichog algorithm on their data using PerformPSEA (details regarding available algorithms found in MetaboAnalystR 2.0 vignette Section 4). First, users will set algorithm to be used to mummichog using SetPeakEnrichMethod. Second, users will set the p-value cutoff to delineate between significantly enriched and non-significantly enriched m/z features using the SetMummichogPval function. Finally, use the PerformPSEA to calcaulte pathway activity.
The output of this module first consists of a table of results identifying the top-pathways that are enriched in the user-uploaded data, which can be found in your working directory named "mummichog_pathway_enrichment.csv". The table consists of the total number of hits, the raw p-value (Fisher’s or Hypergeometric), the EASE score, and the adjusted p-value (for permutations) per pathway. A second table can also be found in your working directory named "mummichog_matched_compound_all.csv", that contains all matched metabolites from the user’s uploaded list of m/z features.
For this tutorial we will first directly use the an example peak list data obtained from untargeted metabolomics of pediatric IBD (mass accuracy 5.0, negative ion mode). This is an example of the three-column table containing the m.z, p.value and t.score.
# Create objects for storing processed data from the MS peaks to pathways module mSet <- InitDataObjects("mass_all", "mummichog", FALSE) # Set the format of the peak list, contains m.z, p.value, and t.score SetPeakFormat("mpt") # Set parameters for analysis, in this case the mass accuracy is set to 5 ppm, the mode of the MS instrument is negative mSet <- UpdateInstrumentParameters(mSet, "5", "negative"); # Read in peak-list data mSet <- Read.PeakListData(mSet, "https://www.metaboanalyst.ca/MetaboAnalyst/resources/data/mummichog_ibd.txt"); # Sanity check of the uploaded data mSet <- SanityCheckMummichogData(mSet) # Perform the mummichog algorithm, in this case the pathway library is from the human MFN model. First set the # algorithm to be used to mummichog, then set the p-value cutoff. # This function may take sometime for processing, and will output the pathway-results and the compound matching tables in your working directory mSet<-SetPeakEnrichMethod(mSet, "mum") mSet<-SetMummichogPval(mSet, 0.2) mSet<-PerformPSEA(mSet, "hsa_mfn", "current", permNum = 1000) mSet<-PlotPeaks2Paths(mSet, "peaks_to_paths_0_", "png", 72, width=NA) # To view the results of the pathway analysis in R, use mSet$mummi.resmat # Use this function to view a table of the significant / non-significant compound hits in a selected pathway mSet <- GetMummichogPathSetDetails(mSet, "Vitamin E metabolism") # Use this function to view a table of matching details (i.e. adducts, t-scores, m.z) for a compound mSet <- GetCompoundDetails(mSet, "CE5843")
Raw MS peaks contain a significant amount of adducts specific to their MS instrument and analytical mode. A comprehensive adduct list is shown below in the "Available" panel. Use this to customize the adduct list used by the Mummichog/GSEA algorithms to match m/z peaks to potential compounds hits. The list of available adducts to choose from can be found in here: positive; here: negative ; here:mixed.
For the negative ion mode, the adducts used are: M-H[-], M-2H[2-], M(C13)-H[-], M(S34)-H[-], M(Cl37)-H[-], M+Na-2H[-], M+K-2H[-], M-H2O-H[-], M+Cl[-], M+Cl37[-], M+Br[-], M+Br81[-], M+ACN-H[-], M+HCOO[-], M+CH3COO[-], and M-H+O[-].
For the positive ion mode, the adducts used are: M[1+], M+H[1+], M+2H[2+], M+3H[3+], M(C13)+H[1+], M(C13)+2H[2+], M(C13)+3H[3+], M(S34)+H[1+], M(Cl37)+H[1+], M+Na[1+], M+H+Na[2+], M+K[1+], M+H2O+H[1+], M-H2O+H[1+], M-H4O2+H[1+], M-NH3+H[1+], M-CO+H[1+], M-CO2+H[1+], M-HCOOH+H[1+], M+HCOONa[1+], M-HCOONa+H[1+], M+NaCl[1+], M-C3H4O2+H[1+], M+HCOOK[1+], and M-HCOOK+H[1+].
Currency metabolites are abundant substances such as water and carbon dioxide known to occur in normal functioning cells and participate in a large number of metabolic reactions (Huss and Holme, 2007). Because of their ubiquitous nature, removing them can greatly improve pathway analysis and visualization. The list of available currency metabolites can be found here.
By default, the MS Peaks to Paths module considers these metabolites as currency: 'C00001', 'C00080', 'C00007', 'C00006', 'C00005', 'C00003', 'C00004', 'C00002', 'C00013', 'C00008', 'C00009', 'C00011', 'G11113', 'H2O', 'H+', 'Oxygen', 'NADP+', 'NADPH', 'NAD+', 'NADH', 'ATP', 'Pyrophosphate', 'ADP', 'Orthophosphate', and 'CO2'.
Now we will use another example peak list data obtained from untargeted metabolomics of mice alveolar macrophages in lungs, using an Orbitrap LC-MS (C18 negative ion mode and HILIC positive ion mode). This is an example of the four-column table containing the m.z, mode, p.value, and t.score. We will perform both currency and adduct customization.
# Create objects for storing processed data from the MS peaks to pathways module mSet<-InitDataObjects("mass_all", "mummichog", FALSE) SetPeakFormat("mpt") mSet<-UpdateInstrumentParameters(mSet, "5.0", "mixed"); mSet<-Read.PeakListData(mSet, "https://www.metaboanalyst.ca/MetaboAnalyst/resources/data/mummichog_mixed.txt"); mSet<-SanityCheckMummichogData(mSet) # Customize currency curr.vec <- c("Carbon dioxide (C00011)","Acetoacetyl CoA (C00332)","Proton (C00080)","Hydrogen (C00282)","Water (C00001)","Hydrogen peroxide (C00027)","NADH (C00004)","NAD (C00003)","Oxygen (C00007)") # Map selected currency to user's data mSet<-Setup.MapData(mSet, curr.vec); mSet<-PerformCurrencyMapping(mSet) # Now customize adducts add.vec <- c("M [1+]","M+H [1+]","M+Na [1+]","M+K [1+]","M-H2O+H [1+]","M-CO2+H [1+]","M-H [1-]","M-H2O-H [1-]","M+Cl [1-]","M+Br [1-]") # Set up the selected adducts mSet<-Setup.AdductData(mSet, add.vec); mSet<-PerformAdductMapping(mSet, "mixed") # Perform mummichog algorithm using selected currency mSet<-SetPeakEnrichMethod(mSet, "mum") mSet<-SetMummichogPval(mSet, 1.0E-5) mSet<-PerformPSEA(mSet, "hsa_mfn", "current", 100) mSet<-PlotPeaks2Paths(mSet, "peaks_to_paths_0_", "png", 72, width=NA)
Following analysis, a comprehensive report can be generated which contains a detailed description of each step performed in the R package, embedded with graphical and tabular outputs. To prepare the sweave report, please use the PreparePDFReport function. You must ensure that you have the nexessary Latex libraries to generate the report (i.e. pdflatex, LaTexiT). The object created must be named mSet, and specify the user name in quotation marks.
PreparePDFReport(mSet, "My Name")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.