MetEx is a R package to extract and annotate metabolites from liquid chromatography–mass spectrometry data.
Liquid chromatography–high resolution mass spectrometry (LC-HRMS) is the most popular platform for untargeted metabolomics methods, but annotating LC-HRMS data remains a challenge due to the limits of metabolome databases and annotation strategies. In this work, a new LC-HRMS database, MetExDB, was developed, containing retention time (tR), MS1 and MS2 information for 24,674 compounds, in which missing information supplied by machine learning predictions. In parallel, an untargeted LC-HRMS data annotation method based on the database, called MetEx, was suggested for targeted extraction and identification of compounds using information entropy to assist real signal recognition. The number of true positive compounds annotated by MetEx is 2.1~2.6 times that of software packages that use the traditional peak detection-based annotation method. MetEx achieves a false discovery rate of lower than 0.7% using orthogonal information (tR and MS) when using mixed standard solutions for validation. In addition, MetEx supports user-defined databases to suit more application scenarios and is provided as an open-source R package (https://github.com/zhengfj1994/MetEx).
Install the R package "devtools" and other reliable packages, then install MetEx using codes below. >>The devtools package
install.packages(c("devtools","BiocManager"))
BiocManager::install(c("xcms","KEGGREST"),update = TRUE, ask = FALSE)
devtools::install_github('zhengfj1994/MetEx')
It will take few minutes to download the packages.
4. If the third step fails to install, users can download the project and install off line as shown in Figure 2-4:
Finally, choose install from Package Archive File (.zip; .tar.gz), and select the MetEx-master.zip, click install.
library(MetEx)
MetEx dependent the following packages, If you find that the installation fails and you are prompted that the following installation package is missing, please manually install the missing packages. openxlsx, tcltk, doSNOW, stringr, xcms, do, KEGGREST, XML, progress, shinydashboard, shinycssloaders, shinyjs, ggrepel, DT, dplyr, foreach, jsonlite, snow, tidyr, BiocManager, knitr, shiny, ggplot2, RColorBrewer
The database is stored in .xlsx file. And the first row is column names. Column names are specific, and using an irregular column name would make the database unrecognizable. The database should containing these columns:
Name: the compound name.
m/z: the accurate mass in LC-MS
tr: the retention time (in second)
ionMode: the ion mode of LC-MS, positive ion mode is P and negative ion mode is N.
CE: collision energy.
MSMS: the MS/MS spectrum. The ion and its intensity are separated by a space (" "), and the ion and the ion are separated by a semicolon (";"). For example: 428.036305927272 0.0201115093588212;524.057195188813 0.0699256604274525;542.065116124274 0.148347272003186;664.112740221342 1 is the abbreviate of MS/MS spectrum below:
| m/z | intensity | | :--------------: | :----------------: | | 428.036305927272 | 0.0201115093588212 | | 524.057195188813 | 0.0699256604274525 | | 542.065116124274 | 0.148347272003186 | | 664.112740221342 | 1 |
OSI-SMMS: Our in-house database, containing ~2000 metabolites in positive or negative ion modes, but not open-accessed now.
MSMLS: Acquired by Mass Spectrometry Metabolite Library (MSMLS) supplied by IROA Technologies. containing ~300 metabolites.
MoNA: Download from MoNA and transfer it to the format used for MetEx. The retention time is predicted.
And we provided the method to transfer the MoNA database (saved in MSP) to the format used for MetEx:
library(MetEx)
library(openxlsx)
mspData <- readMsp(file = "D:/MoNA-export-All_Spectra.msp") # The file path should change to yours.
mspDataframe <- exactMsp(mspData)
write.xlsx(mspDataframe, file = "D:/MoNA used for MetEx.xlsx") # The file path should change to yours.
The retention time prediction method can be seen in the Part of Retention time prediction.
KEGG: Download from KEGG and transfer it to the format used for MetEx. The retention time and MS/MS spectrum are predicted.
We provided the method to download KEGG and transfer it to the format used for MetEx:
library(MetEx)
kegg.compound.database.df <- KEGGdownloader()
write.xlsx(kegg.compound.database.df, file = "D:/KEGG used for MetEx.xlsx") # The file path should change to yours.
The retention time prediction method can be seen in the Part of Retention time prediction.
The MS/MS spectrum prediction method can be seen in the Part of MS/MS spectrum prediction.
Other databases(Constantly updating ... ...)
User-defined database: All users can defined their own database by our database format. And we will add more database in MetEx.
We provided a Shiny App and its screen shot is shown in Figure 6.
Please confirm that you have install the MetEx package in R.
Open Rstudio.
Enter the following line of code:
shiny::runApp(system.file("extdata/shinyApp", "app.R", package = "MetEx"))
A new visualization window is opened.
There are four main taps in the left. The first tap is introduction, the second tap is the annotation work flow of a single file by MetEx, the second tap is the annotation work flow of multiple files by MetEx, the fourth work flow is the annotation work flow based on peak detection result. The parameters of several modules are described separately in the next section.
5.2 The second tap, MetEx (Single file), annotation work flow of a single file by MetEx:
Database: A database file that meets MetEx's formatting requirements.
Ion mode: The ion mode used for annotation, "positive" or "negative".
CE: Collision energy used for MS/MS acquisition, "all", "low", "medium" or "high". Only when the low, medium and high collision energies of the database are 15, 30, 45 eV, the "low", "medium" and "high" options can be used, otherwise, please use the "all" option.
Whether to perform tR calibration: "Yes: means the tR prediction will be preformed and you should provide an xlsx for tR prediction. "No" means tR prediction will not be preformed.
tR of internal standards: a xlsx file, which contain the retention time of internal standards in database and experiment. It should be looked like Figure 7.
Users can also open the file named "IS-for-tR-calibration.xlsx" (inst/extdata/trCalibration) to see the format of the files, but please do not change it.
mzXML file: The mzXML file which is transfered from raw LC-MS data (by using MSconvert in proteowizard).
mgf file: The mgf file which is transfered from raw LC-MS data (by using MSconvert in proteowizard).
Delta m/z of MS1: The tolerance of MS1 between database and experiment (0.01 Da is recommended for Q-TOF and 0.005 Da is recommended for Obitrap).
Delta tR of MS1: the tolerance of retention time between database and experiment. The unit is seconds.
Entropy threshold: The information entropy threshold, 1.75 - 2 is recommended.
Intensity threshold: The peak height threshold. 600-270 is recommended for Q-TOF.
Delta m/z of MS1 and MS2: The tolerance between MS1 and MS2 in experiment.
Delta m/z of MS2: The tolerance of MS2 between database and experiment.
Delta tR of MS1 and MS2: The tolerance of tR between MS1 and MS2
MS2 score threshold: The MS2 score threshold (0-1)
Result (csv file): The result file path and name (.csv file).
Result (xlsx file): The result file path and name (.xlsx file).
Number of cores for parallel computing: The number of CPU cores for parallel computing, it is depend on your computer's CPU and RAM. Users can refer to the following rules:
The number of cores for parallel computing < The number of CPU cores of your computer &
The number of cores for parallel computing × 4 GB < The RAM of you computer
MS2 S/N threshold: The MS2 S/N threshold.
MS2 noise intensity: The MS2 noise intensity, "minimum" or a number.
MS2 missing value padding method: The MS2 missing value padding method, two options are available, "half" and "minimal". "Half" is referred to MS-DIAL and "minimal" is closer to the actual situation. And now we recommended "minimal".
5.3 The third tap, MetEx (Multiple file), annotation work flow of multiple files by MetEx:
5.3 The fourth tap, Annotation from peak table, annotation work flow based on peak detection result:
dbFile: the path of the database (xlsx file).
ionMode: the ion mode of the LC-MS, only support two values, positive ion mode is "P" and negative ion mode is "N".
CE: the collision energy of MS/MS spectrum, it depended on the experimental MS/MS conditions and the CE value in databases. The default is "all".
is.tR.file: the xlsx file of IS retention times in database and in your experiment.
database.df: the imported database data frame.
MetEx provide two approaches to annotate metabolites. The first approach is peak-detection-independent method and the second is peak-detection-dependent method. The first approach is newly developed and could avoid the peak loss in conventional peak detection methods.
Peak-detection-independent metabolite annotation method without retention time calibration (signal LC-MS data):
Convert the LC-MS raw data to .mzXML and .mgf file using MSConvert (http://proteowizard.sourceforge.net/tools.shtml, provided by ProteoWizard).
We used the built-in data files as examples to shown how to do the annotation. the database is:
system.file("extdata/database", "example_database.xlsx", package = "MetEx")
The mzXML file is:
system.file("extdata/mzXML", "example.mzXML", package = "MetEx")
The mgf file is:
system.file("extdata/mgf", "example.mgf", package = "MetEx")
The codes used the example data above to do annotation is shown below:
R
library(MetEx)
dbData <- dbImporter(dbFile = system.file("extdata/database", "example_database.xlsx", package = "MetEx"), ionMode = 'P', CE = "all") # If you want to use other database, just change the dbFile to your own database such as "D:/MyCompoundDatabase.xlsx"
targExtracRes <- targetExtraction.parallel(msRawData = system.file("extdata/mzXML", "example.mzXML", package = "MetEx"), dbData, deltaMZ=0.01, deltaTR=60, cores = 1) # If you want to use your own data, just change the msRawData to your own data such as "D:/My-mzXML-data.mzXML"
ms1Info <- extracResFilter(targExtracRes, entroThre = 2, intThre = 270)
mgfList <- importMgf(mgfFile=system.file("extdata/mgf", "example.mgf", package = "MetEx")) # If you want to use you own data, just change the mgfFile to your own data such as "D:/My-mgf-data.mgf"
batchMS2ScoreResult <- batchMS2Score.parallel(ms1Info, ms1DeltaMZ = 0.01, ms2DeltaMZ = 0.02, deltaTR = 12, mgfMatrix = mgfList$mgfMatrix, mgfData = mgfList$mgfData, scoreMode = "average", cores = 1)
write.table(batchMS2ScoreResult, file = "D:/Example-result.csv", col.names = NA, sep = ",", dec = ".", qmethod = "double")
identifiedResFilter(csvFile="D:/Example-result.csv", resFile="D:/Example-result.xlsx", MS2score=0.6)
We also provide a one-line code method to implement the metabolites targeted extraction and annotation.
R
library(MetEx)
MetExAnnotation(dbFile = system.file("extdata/database", "example_database.xlsx", package = "MetEx"),
ionMode = "P",
msRawData = system.file("extdata/mzXML", "example.mzXML", package = "MetEx"),
MS1deltaMZ = 0.01,
MS1deltaTR = 120,
entroThre = 2,
intThre = 270,
mgfFile = system.file("extdata/mgf", "example.mgf", package = "MetEx"),
MS1MS2DeltaMZ = 0.01,
MS2DeltaMZ = 0.02,
MS1MS2DeltaTR = 12,
csvFile = "D:/Example-result.csv",
xlsxFile = "D:/Example-result.xlsx",
MS2scoreFilter = 0.6)
Peak-detection-independent metabolite annotation method with retention time calibration (signal LC-MS data):
Convert the LC-MS raw data to .mzXML and .mgf file using MSConvert (http://proteowizard.sourceforge.net/tools.shtml, provided by ProteoWizard).
We used the built-in data files as examples to shown how to do the annotation.
The retention time of IS used for retention time calibration is:
system.file("extdata/trCalibration", "IS-for-tR-calibration.xlsx", package = "MetEx")
If you want to calibrate retention times, you should get the experimental retention time of internal standards which are concluded in "IS for retention time calibration.xlsx" and mentioned in our published paper. Write them in the xlsx file.
The codes used the example data above to do annotation is shown below:
R
library(MetEx)
dbData <- dbImporter(dbFile = system.file("extdata/database", "example_database.xlsx", package = "MetEx"), ionMode = 'P', CE = "all") # If you want to use other database, just change the dbFile to your own database such as "D:/MyCompoundDatabase.xlsx"
dbData <- retentionTimeCalibration(is.tR.file = system.file("extdata/trCalibration", "IS-for-tR-calibration.xlsx", package = "MetEx"), database.df = dbData) # The xlsx file is just an example, if you want to calibrate the retention time, please change the file to yours such as "D:/MyCompoundDatabase.xlsx"
targExtracRes <- targetExtraction.parallel(msRawData=system.file("extdata/mzXML", "example.mzXML", package = "MetEx"), dbData, deltaMZ=0.01, deltaTR=60, cores = 1) # If you want to use your own data, just change the msRawData to your own data such as "D:/My-mzXML-data.mzXML"
ms1Info <- extracResFilter(targExtracRes, entroThre = 2, intThre = 270)
mgfList <- importMgf(mgfFile=system.file("extdata/mgf", "example.mgf", package = "MetEx")) # If you want to use you own data, just change the mgfFile to your own data such as "D:/My-mgf-data.mgf"
batchMS2ScoreResult <- batchMS2Score.parallel(ms1Info, ms1DeltaMZ = 0.01, ms2DeltaMZ = 0.02, deltaTR = 12, mgfMatrix = mgfList$mgfMatrix, mgfData = mgfList$mgfData, scoreMode = "average", cores = 1)
write.table(batchMS2ScoreResult, file = "D:/Example-result.csv", col.names = NA, sep = ",", dec = ".", qmethod = "double")
identifiedResFilter(csvFile="D:/Example-result.csv", resFile="D:/Example-result.xlsx", MS2score=0.6)
We also provide a one-line code method to implement the metabolites targeted extraction and annotation.
R
library(MetEx)
MetExAnnotation(dbFile = system.file("extdata/database", "example_database.xlsx", package = "MetEx"),
ionMode = "P",
tRCalibration = T,
is.tR.file = system.file("extdata/trCalibration", "IS-for-tR-calibration.xlsx", package = "MetEx"),
msRawData = system.file("extdata/mzXML", "example.mzXML", package = "MetEx"),
MS1deltaMZ = 0.01,
MS1deltaTR = 120,
entroThre = 2,
intThre = 270,
mgfFile = system.file("extdata/mgf", "example.mgf", package = "MetEx"),
MS1MS2DeltaMZ = 0.01,
MS2DeltaMZ = 0.02,
MS1MS2DeltaTR = 12,
csvFile = "D:/Example-result.csv",
xlsxFile = "D:/Example-result.xlsx",
MS2scoreFilter = 0.6)
Peak-detection-independent metabolite annotation without retention time calibration (multiple LC-MS data):
Convert the LC-MS raw data to .mzXML and .mgf file using MSConvert (http://proteowizard.sourceforge.net/tools.shtml, provided by ProteoWizard).
Create a new folder, such as "Data for MetEx", then create three new subfolders named "mzXML", "mgf" and "result" under the folder.
The codes used the example data above to do annotation is shown below:
```R library(MetEx)
path <- "E:/Data for MetEx" mzXML.files <- dir(paste0(path,"/mzXML")) mgf.files <- gsub(".mzXML", ".mgf", mzXML.files) index.files <- gsub(".mzXML", "", mzXML.files)
for (i in c(1:length(mzXML.files))){ print(index.files[i]) MetExAnnotation(dbFile = system.file("extdata/database", "example_database.xlsx", package = "MetEx"), ionMode = "P", msRawData = paste0(path,"/mzXML/",mzXML.files[i]), MS1deltaMZ = 0.01, MS1deltaTR = 120, entroThre = 2, intThre = 270, mgfFile = paste0(path,"/mgf/", mgf.files[i]), MS1MS2DeltaMZ = 0.01, MS2DeltaMZ = 0.02, MS1MS2DeltaTR = 12, csvFile = paste0(path,"/result/", index.files[i],".csv"), xlsxFile = paste0(path,"/result/", index.files[i],".xlsx"), MS2scoreFilter = 0.6) } ```
Peak-detection-independent metabolite annotation method with retention time calibration (multiple LC-MS data):
Convert the LC-MS raw data to .mzXML and .mgf file using MSConvert (http://proteowizard.sourceforge.net/tools.shtml, provided by ProteoWizard).
Create a new folder, such as "Data for MetEx", then create three new subfolders named "mzXML", "mgf" and "result" under the folder.
The codes used the example data above to do annotation is shown below:
```R library(MetEx)
path <- "E:/Data for MetEx" mzXML.files <- dir(paste0(path,"/mzXML")) mgf.files <- gsub(".mzXML", ".mgf", mzXML.files) index.files <- gsub(".mzXML", "", mzXML.files)
for (i in c(1:length(mzXML.files))){ print(index.files[i]) MetExAnnotation(dbFile = system.file("extdata/database", "example_database.xlsx", package = "MetEx"), ionMode = "P", tRCalibration = T, is.tR.file = system.file("extdata/trCalibration", "IS-for-tR-calibration.xlsx", package = "MetEx"), msRawData = paste0(path,"/mzXML/",mzXML.files[i]), MS1deltaMZ = 0.01, MS1deltaTR = 120, entroThre = 2, intThre = 270, mgfFile = paste0(path,"/mgf/", mgf.files[i]), MS1MS2DeltaMZ = 0.01, MS2DeltaMZ = 0.02, MS1MS2DeltaTR = 12, csvFile = paste0(path,"/result/", index.files[i],".csv"), xlsxFile = paste0(path,"/result/", index.files[i],".xlsx"), MS2scoreFilter = 0.6) } ```
Peak-detection-dependent method:
MetEx is focus on targeted extraction and annotation without peak detection. But we consider that the annotation method based on the result of peak detection is still used by many researchers, we also provided the annotation method based on peak detection.
R
# Without tR calibration
library(MetEx)
annotationFromPeakTableRes <- annotationFromPeakTable(
peakTable = system.file("extdata/peakTable","example.csv", package = "MetEx"),
mgfFile = system.file("extdata/mgf","example.mgf", package = "MetEx"),
database = system.file("extdata/database","example_database.xlsx", package = "MetEx"),
ionMode = "P",
MS1DeltaMZ = 0.01,
MS1DeltaTR = 120,
MS1MS2DeltaTR = 5,
MS1MS2DeltaMZ = 0.01,
MS2DeltaMZ = 0.02,
result.file = "D:/Example-result.xlsx")
R
# With tR calibration
library(MetEx)
annotationFromPeakTableRes <- annotationFromPeakTable(
peakTable = system.file("extdata/peakTable","example.csv", package = "MetEx"),
mgfFile = system.file("extdata/mgf","example.mgf", package = "MetEx"),
database = system.file("extdata/database","example_database.xlsx", package = "MetEx"),
ionMode = "P",
tRCalibration = T,
is.tR.file = system.file("extdata/trCalibration", "IS-for-tR-calibration.xlsx", package = "MetEx"),
MS1DeltaMZ = 0.01,
MS1DeltaTR = 120,
MS1MS2DeltaTR = 5,
MS1MS2DeltaMZ = 0.01,
MS2DeltaMZ = 0.02,
result.file = "D:/Example-result.xlsx")
Fujian Zheng zhengfj@dicp.ac.cn or 2472700387@qq.com
v1.0
The first version
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.