1. Introduction

Previous versions of MetaboAnalyst encompassed two modules for functional analysis, metabolic pathway analysis (MetPA) (Xia et al. 2010, https://academic.oup.com/bioinformatics/article/26/18/2342/208464) and metabolite set enrichment analysis (MSEA) (Xia et al. 2010, https://academic.oup.com/nar/article/38/suppl_2/W71/1101310). However, these modules require metabolite identifications prior to use, which remains an important challenge in untargeted metabolomics. In comparison, the mummichog algorithm (Li et al. 2013, http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003123) bypasses the bottleneck of metabolite identification prior to pathway analysis, leveraging a priori pathway and network knowledge to directly infer biological activity based on mass peaks. We have therefore implemented the mummichog algorithm in R in a new module named “MS Peaks to Pathways”. The knowledge-base for this module consists of five genome-scale metabolic models from the original Python implementation which have either been manually curated or downloaded from BioCyc, as well as an expanded library of 21 organisms derived from KEGG metabolic pathways.

2. MS Peaks to Pathways

To use this module, users must upload a table (.txt) using the Read.PeakListData function containing either:

1) Three columns - m/z features, p-values, and t-scores or fold-change values. 2) Two columns containing m/z features and either p-values or t-scores 3) One column ranked by either p-values or t-scores. 4) Four columns - m/z features, p-values, t-scores or fold-change values, and the mode (positive or negative).

All inputted files must be in .txt format. If the input is a three column table, both the mummichog and GSEA algorithms (and their combination) can be applied. If only p-values (or ranked by p-values) are provided, then only the mummichog algorithm will be applied. If only t-scores (or ranked by t-scores) are provided, then only the GSEA algorithm will be applied.

If p-values have not yet been calculated, users can use the "Statistical Analysis" module to upload their raw peak tables, process the data, perform t-tests or fold-change analysis, and then upload these results into the module. With the table, users also need to specify the type of MS instrument, the ion mode (positive or negative) using the UpdateInstrumentParameters function. Currently, MetaboAnalystR only supports the handling of peaks obtained from high-resolution MS instruments such as Orbitrap, or Fourier Transform (FT)-MS instruments as recommended by the original mummichog implementation. Following data upload, users much select the organism’s library from which to perform untargeted pathway analysis using the SetMass.PathLib function. Users can then perform the mummichog algorithm on their data using PerformPSEA (details regarding available algorithms found in MetaboAnalystR 2.0 vignette Section 4). First, users will set algorithm to be used to mummichog using SetPeakEnrichMethod. Second, users will set the p-value cutoff to delineate between significantly enriched and non-significantly enriched m/z features using the SetMummichogPval function. Finally, use the PerformPSEA to calcaulte pathway activity.

The output of this module first consists of a table of results identifying the top-pathways that are enriched in the user-uploaded data, which can be found in your working directory named "mummichog_pathway_enrichment.csv". The table consists of the total number of hits, the raw p-value (Fisher’s or Hypergeometric), the EASE score, and the adjusted p-value (for permutations) per pathway. A second table can also be found in your working directory named "mummichog_matched_compound_all.csv", that contains all matched metabolites from the user’s uploaded list of m/z features.

For this tutorial we will first directly use the an example peak list data obtained from untargeted metabolomics of pediatric IBD (mass accuracy 5.0, negative ion mode). This is an example of the three-column table containing the m.z, p.value and t.score.

library(MetaboAnalystR)

# Create objects for storing processed data from the MS peaks to pathways module
mSet <- InitDataObjects("mass_all", "mummichog", FALSE)

# Set the format of the peak list, contains m.z, p.value, and t.score
SetPeakFormat("mpt")

# Set parameters for analysis, in this case the mass accuracy is set to 5 ppm, the mode of the MS instrument is negative
mSet <- UpdateInstrumentParameters(mSet, 5, "negative");

# Read in peak-list data
mSet <- Read.PeakListData(mSet, "https://www.metaboanalyst.ca/MetaboAnalyst/resources/data/mummichog_ibd.txt");

# Sanity check of the uploaded data
mSet <- SanityCheckMummichogData(mSet)

# Perform the mummichog algorithm, in this case the pathway library is from the human MFN model. First set the 
# algorithm to be used to mummichog, then set the p-value cutoff. We're setting the version to "v1"
# as this data does not contain retention time information.
# This function may take sometime for processing, and will output the pathway-results and the compound matching tables in your working directory
mSet<-SetPeakEnrichMethod(mSet, "mum", "v1")
mSet<-SetMummichogPval(mSet, 0.2)
mSet<-PerformPSEA(mSet, "hsa_mfn", "current", permNum = 1000)
mSet<-PlotPeaks2Paths(mSet, "peaks_to_paths_0_", "png", 72, width=NA)

# To view the results of the pathway analysis in R, use mSet$mummi.resmat

# Use this function to view a table of the significant / non-significant compound hits and m/z matching details in a selected pathway
mSet <- GetMummichogPathSetDetails(mSet, "Vitamin E metabolism")

# Use this function to view a table of matching details (i.e. adducts, t-scores, m.z) for a compound
mSet <- GetCompoundDetails(mSet, "CE5843")

2.1 Adduct Customization

Raw MS peaks contain a significant amount of adducts specific to their MS instrument and analytical mode. A comprehensive adduct list is shown below in the "Available" panel. Use this to customize the adduct list used by the Mummichog/GSEA algorithms to match m/z peaks to potential compounds hits. The list of available adducts to choose from can be found in here: positive; here: negative ; here:mixed.

For the negative ion mode, the adducts used are: M-H[-], M-2H[2-], M(C13)-H[-], M(S34)-H[-], M(Cl37)-H[-], M+Na-2H[-], M+K-2H[-], M-H2O-H[-], M+Cl[-], M+Cl37[-], M+Br[-], M+Br81[-], M+ACN-H[-], M+HCOO[-], M+CH3COO[-], and M-H+O[-].

For the positive ion mode, the adducts used are: M[1+], M+H[1+], M+2H[2+], M+3H[3+], M(C13)+H[1+], M(C13)+2H[2+], M(C13)+3H[3+], M(S34)+H[1+], M(Cl37)+H[1+], M+Na[1+], M+H+Na[2+], M+K[1+], M+H2O+H[1+], M-H2O+H[1+], M-H4O2+H[1+], M-NH3+H[1+], M-CO+H[1+], M-CO2+H[1+], M-HCOOH+H[1+], M+HCOONa[1+], M-HCOONa+H[1+], M+NaCl[1+], M-C3H4O2+H[1+], M+HCOOK[1+], and M-HCOOK+H[1+].

2.2 Currency Customization

Currency metabolites are abundant substances such as water and carbon dioxide known to occur in normal functioning cells and participate in a large number of metabolic reactions (Huss and Holme, 2007). Because of their ubiquitous nature, removing them can greatly improve pathway analysis and visualization. The list of available currency metabolites can be found here.

By default, the MS Peaks to Paths module considers these metabolites as currency: 'C00001', 'C00080', 'C00007', 'C00006', 'C00005', 'C00003', 'C00004', 'C00002', 'C00013', 'C00008', 'C00009', 'C00011', 'G11113', 'H2O', 'H+', 'Oxygen', 'NADP+', 'NADPH', 'NAD+', 'NADH', 'ATP', 'Pyrophosphate', 'ADP', 'Orthophosphate', and 'CO2'.

Now we will use another example peak list data obtained from untargeted metabolomics of mice alveolar macrophages in lungs, using an Orbitrap LC-MS (C18 negative ion mode and HILIC positive ion mode). This is an example of the four-column table containing the m.z, mode, p.value, and t.score. We will perform both currency and adduct customization.

# Create objects for storing processed data from the MS peaks to pathways module
mSet<-InitDataObjects("mass_all", "mummichog", FALSE)

# Set peak formart - contains m/z features, p-values and t-scores
SetPeakFormat("mpt")

mSet<-UpdateInstrumentParameters(mSet, 5.0, "mixed");

mSet<-Read.PeakListData(mSet, "https://www.metaboanalyst.ca/MetaboAnalyst/resources/data/mummichog_mixed.txt");

mSet<-SanityCheckMummichogData(mSet)

# Customize currency
curr.vec <- c("Water (C00001)","Proton (C00080)","Oxygen (C00007)","NADPH (C00005)","NADP (C00006)","NADH (C00004)","NAD (C00003)","Adenosine triphosphate (C00002)","Pyrophosphate (C00013)","Phosphate (C00009)","Carbon dioxide (C00011)","Hydrogen (C00282)","Hydrogen peroxide (C00027)","Sodium (C01330)")

# Map selected currency to user's data
mSet<-Setup.MapData(mSet, curr.vec);
mSet<-PerformCurrencyMapping(mSet)

# Now customize adducts
add.vec <- c("M [1+]","M+H [1+]","M+2H [2+]","M+3H [3+]","M+Na [1+]","M+H+Na [2+]","M+K [1+]","M+H2O+H [1+]","M-H2O+H [1+]","M-H4O2+H [1+]","M(C13)+H [1+]","M(C13)+2H [2+]","M(C13)+3H [3+]","M(Cl37)+H [1+]","M-NH3+H [1+]","M-CO+H [1+]","M-CO2+H [1+]","M-HCOOH+H [1+]","M+HCOONa [1+]","M-HCOONa+H [1+]","M+NaCl [1+]","M-C3H4O2+H [1+]","M+HCOOK [1+]","M-HCOOK+H [1+]","M-H [1-]","M-2H [2-]","M-H2O-H [1-]","M-H+O [1-]","M+K-2H [1-]","M+Na-2H [1- ]","M+Cl [1-]","M+Cl37 [1-]","M+HCOO [1-]","M+CH3COO [1-]")

# Set up the selected adducts
mSet<-Setup.AdductData(mSet, add.vec);
mSet<-PerformAdductMapping(mSet, "mixed")

# Perform mummichog algorithm using selected currency and adducts, using Version1 of the mummichog algorithm
mSet<-SetPeakEnrichMethod(mSet, "mum", "v1")

mSet<-SetMummichogPval(mSet, 1.0E-5)

mSet<-PerformPSEA(mSet, "hsa_mfn", "current", 100)

mSet<-PlotPeaks2Paths(mSet, "peaks_to_paths_0_", "png", 72, width=NA)

2.3 Retention Time Integration

The SetPeakEnrichMethod now has a parameter to let users select whether to use Version 1 or Version 2 of the MS Peaks to Paths algorithms. With Version 2, users can now upload retention time information to perform pathway analysis. Note that Version 2 should only be used if user's data contains a retention time ("rt" or "r.t") column. Retention time is used to move pathway analysis from the "Compound" space to "Empirical Compound" space (details below in "How are Empirical Compounds calculated?"). The inclusion of retention time will increase the confidence and robustness of the potential compound matches. Another difference is that currency compounds are removed directly from the user's selected pathway library, versus removed from potential compound hits during the permutations.

2.3.1 How are Empirical Compounds calculated?

Empirical Compounds are intermediaries between m/z features and compounds. The steps for how they are formed are as follows:

  1. As in version 1, all m/z features are matched to potential compounds considering different adducts. Then, per compound, all matching m/z features are split into Empirical Compounds based on whether they match within an expected retention time window. The retention time window (in seconds) is calculated as the maximum retention time * 0.02. This results in the initial Empirical Compounds list.

  2. Next, Empirical Compounds are merged if they have the same m/z, matched form/ion, and retention time. This results in the merged Empirical Compounds list.

  3. Finally, if primary ions are enforced, only Empirical Compounds containing at least 1 primary ion are kept. Primary ions considered are 'M+H[1+]', 'M+Na[1+]', 'M-H2O+H[1+]', 'M-H[-]', 'M-2H[2-]', 'M-H2O-H[-]', 'M+H [1+]', 'M+Na [1+]', 'M-H2O+H [1+]', 'M-H [1-]', 'M-2H [2-]', and 'M-H2O-H [1-]'. This results in the final Empirical Compounds list.

  4. Next, pathway libraries are converted from "Compound" space to "Empirical Compound" space. This is done by converting all compounds in each pathway to all Empirical Compound matches. Then the mummichog/GSEA algorithms work as before to calculate pathway enrichment.

For this tutorial we will use the same example peak list data obtained from untargeted metabolomics of pediatric IBD (mass accuracy 5.0, negative ion mode). This time however, retention time information is included for a four-column table with the following headers: "m.z", "p.value", "t.score" and "r.t".

mSet<-InitDataObjects("mass_all", "mummichog", FALSE)
SetPeakFormat("mprt")
mSet<-UpdateInstrumentParameters(mSet, 5.0, "negative");
mSet<-Read.PeakListData(mSet, "https://www.metaboanalyst.ca/MetaboAnalyst/resources/data/mummichog_rt.txt");
mSet<-SanityCheckMummichogData(mSet)
mSet<-SetPeakEnrichMethod(mSet, "mum", "v2")
mSet<-SetMummichogPval(mSet, 0.2)
mSet<-PerformPSEA(mSet, "hsa_mfn", "current", 100)
mSet<-PlotPeaks2Paths(mSet, "peaks_to_paths_0_", "png", 300, width=NA)

2.3.2 Customizing Empirical Compound Formation

By default, V2 of the MS Peaks to Pathways algorithms enforces primary ions to be present when creating Empirical Compounds (step 3 above). Users can disable this in the UpdateInstrumentParameters function by setting the "force_primary_ion" parameter to "no". Additionally, by default the retention-time window used to split Empirical Compounds is calculated as the maximum retention time in the user's data multiplied by the retention time fraction (default is 0.02). Users can either change the retention time fraction (rt_frac) or set the retention time-window (rt_tol - in seconds). Code details are shown below.

# Disable force primary ion
mSet <- UpdateInstrumentParameters(mSet, instrumentOpt, msModeOpt, 
                                   force_primary_ion = "no", rt_frac = 0.02, 
                                   rt_tol = NA)

# Change retention time fraction when calculating the retention time window
mSet <- UpdateInstrumentParameters(mSet, instrumentOpt, msModeOpt, 
                                   force_primary_ion = "yes", rt_frac = 0.025, 
                                   rt_tol = NA)

# Set the retention time window (in seconds)
mSet <- UpdateInstrumentParameters(mSet, instrumentOpt, msModeOpt, 
                                   force_primary_ion = "yes", rt_frac = 0.02, 
                                   rt_tol = 25)

3. Sweave Report

Following analysis, a comprehensive report can be generated which contains a detailed description of each step performed in the R package, embedded with graphical and tabular outputs. To prepare the sweave report, please use the PreparePDFReport function. You must ensure that you have the nexessary Latex libraries to generate the report (i.e. pdflatex, LaTexiT). The object created must be named mSet, and specify the user name in quotation marks.

PreparePDFReport(mSet, "My Name")


xia-lab/MetaboAnalystR3.0 documentation built on May 6, 2020, 11:03 p.m.