MS Raw Data Files

library_generator

R Documentation

Generating spectral library from raw LC-MS/MS chromatograms

Description

The function picks up targeted MS1/MS2 scans and merge them into a spcetral library (new or existing). The raw LC-MS/MS files must be centroid-mode mzML, mzMXL or mzData. Ions are selected based on m/z (and retention time) specified in the metadata (recommended) or by automatic peak picking in XCMS package.

Usage

library_generator(raw_data_files, metadata_file, mslevel = c(1, 2),
  MS2_type = "DDA", isomers = TRUE, adduct_type = "M+H",
  max.charge = 1, rt_search = 12, ppm_search = 20, baseline = 1000,
  relative = 5, normalized = T, user = "", write_files = TRUE,
  input_library = "", output_library = "")

Arguments

`raw_data_files`	A character vector of file names of chromatograms from which scans are extracted. All files must have be in centroid-mode with mzML or mzMXL extension!
`metadata_file`	A single character or NULL (not recommended). If provided, it must be the metadata file name with csv extension. The first five columns of the metadata must be (in order): "PEPMASS" (precursor masses that we want to find in chromatograms), "RT" (retention time of metabolic features to be found, in minute, please put it to N/A if unknown), "IONMODE" (must be "Positive" or "Negative"),"ADDUCT" (precursor ion adduct type, must be one of "M+H","M+Na","M+K","M-H" and "M+Cl"), "CHARGE" (charge number, please keep it at 1) and "ID" (A unique identifier for targeted compounds in spectral library). If missing or NULL, a non-targeted feature screening will be performed using MatchedFilter from XCMS. In current release, this functionality only works when all input files are acquired on the same instrument and from the same ion mode, and they must all contain MS1 scans. Please be aware that non-targeted screening can lead to loss of important features or unwanted peaks (e.g. noise).
`mslevel`	Must be 1 (if only MS1 scans/isotopic patterns of targeted m/z are extracted), 2 (if only MS2 scans are extracted) or c(1,2) (if both MS1 and MS2 scans are extracted). Note: Isotopic patterns in MS1 scans are useful for determining precursor formula !
`MS2_type`	A single character ("DDA" or "Targeted") if all raw_dat_files are acquired in the same mode; A character vector precising the acquisition mode of each file in raw_data_files (e.g. c("DDA","Targeted","DDA"))
`isomers`	Logical. TRUE if isomers are kept (scans with same precursor mass but with difference in retention time higher than 2 * rt_search). If FALSE, only the isomer with highest TIC is kept.
`adduct_type`	Vector of character. Adduct types of ions considered. Its elements must be among "Default","M+H","M+Na","M+K","M+NH4","M-H" and "M+Cl". No additional ion species will be calculated if "Default".
`max.charge`	Integer. Maximal charge number. Must be a positive integer e.g. 2 if +1, +2 (or -1, -2) ions are consired.
`rt_search`	Retention time search tolerance (in second) for targeted RT
`ppm_search`	m/z search tolerance (in ppm) for targeted m/z
`baseline`	Numeric. The absolute intensity threshold) that is considered as a mass peak and written into the library. Peaks above both absolute and relative thresholds are saved in the library.
`relative`	Numeric between 0 and 100. The relative intensity threshold of the highest peak in each spectrum). Peaks above both absolute and relative thresholds are saved in the library
`normalized`	Logical. TRUE if the intensities of extracted spectra need to normalized so that the intensity of highest peak will be 100
`user`	Character. Name or ID of the user(s) that created or updated the library.
`write_files`	Logical. TRUE if user wishes to write the mgf and metadata (txt) file in the folder
`input_library`	Character or library object. If character, name of the library into which new scans are added, the file extension must be mgf; please set to empty string "" or NULL if the new library has no dependency with previous ones.
`output_library`	Character.Name of the output library, the file extension must be mgf

Value

complete: Entire spectra library (historical + newly added records) is a list object of two elements: "library$sp" ~ List of all extracted spectra. Each spectrum is a data matrix with two columns: m/z and intensity; "library$metadata" ~ Data frame containing metadata of extracted scans. PEPMASS and RT are updated based on actually-detected scans. Following metadata columns are added: FILENAME (which raw data file the scan is isolated), MSLEVEL (1 or 2), TIC, PEPMASS_DEV (ppm error for detected precursor mass) and SCANNUMBER (scan number in raw chromatogram). Parameters used for library generation were appended. The last three columns were PARAM_USER (user name), PARAM_CREATION_TIME (date and time when the MS record was added) and SCANS (unique identifier for each record, unchanged)
current: Temporary spectra library that only contains newly added scans.
<ouput_library>: A mgf spectral library file (complete spectralibrary) will be written in user's working directory. It contains both spectra and metadata
<ouput_library.txt>: Metadata will be written as a tab-seperated .txt file in user's working directory. Users can check this file in excel or open office.

Author(s)

Youzhong Liu, Youzhong.Liu@uantwerpen.be

Examples

### We download four test data sets:

url = "https://zenodo.org/record/2581847/files/"
original_files = c("NA_170405_MAS006_10.mzML",
                  "TESTMIX2_180504_MAS011_06.mzXML",
                  "JNJ42165279_171214_MAS006_14.mzXML",
                  "GMP_R601592_150925_MAS006_04.mzXML")
download.file(paste0(url,original_files[1]),destfile="MIX1.mzML") # Download and rename the files
download.file(paste0(url,original_files[2]),destfile="MIX2.mzXML")
download.file(paste0(url,original_files[3]),destfile="JNJ.mzXML")
download.file(paste0(url,original_files[4]),destfile="GMP.mzXML")

### We create the first library
raw_data_files = c("MIX1.mzML","MIX2.mzXML","JNJ.mzXML")
metadata_file = "https://raw.githubusercontent.com/daniellyz/MergeION/master/inst/library_metadata.csv"

mslevel = c(1,2)  # Both MS1 and MS2 scans are extracted!
MS2_type = c("DDA","DDA","Targeted") # Mode of MS/MS experiment for the three files
adduct_type = c("Default") # Only looking for default ion types (ion types provided by users in metadata)
max.charge = 1 # Only looking for +1 charged ions
isomers = FALSE # If isomers are present, only the peak with higher TIC is extracted.

rt_search = 12 # Retention time tolerance (s)
ppm_search = 10  # Mass tolerance (ppm)
baseline = 1000  # Baseline level 1000 is fixed for 3 datasets.
relative = 1 # Relative intensitiy level 1% is fixed. All peaks under both baseline and relative level are considered as noise.
normalized = TRUE # The intensities of extracted spectra will be normalized to 100 (the highest peak)

write_files = FALSE # The library(mgf) and metadata will not be writen in user's folder
input_library = "" # A brand new library, there's no previous dependency
output_library = "library_V1.mgf" # Name of the library
user_name = "Florian" # User name for uploading

library1 = library_generator(raw_data_files, metadata_file, mslevel, MS2_type, adduct_type, max.charge, isomers,
                            rt_search, ppm_search, baseline, relative, normalized,
                            user = user_name, write_files, input_library, output_library)

library1 = library1$complete # Important! We extract the library object. "$complete" for extracting the entire library including historical mass spectra. Here since we create a brand-new library, "library1$complete" and "library1$current" are the same.

### Now we process and add a new data GMP.mzXML in the existing library:
raw_data_files = "GMP.mzXML"
adduct_type = c("M+H", "M+Na") # Two adduct types are now considered
MS2_type = "Targeted"
isomers = TRUE # We would like now to record all isomers in the library

write_files = TRUE # We want to directly write the library mgf + metadata files
input_library = library1
output_library = "library_V2.mgf"
user_name = "Thomas" # Another user adds records into the library
library2 = library_generator(raw_data_files, metadata_file, mslevel, MS2_type, isomers, adduct_type, max.charge,
                            rt_search, ppm_search, baseline, relative, normalized,
                            user = user_name, write_files, input_library, output_library)

# In the end, "library_V2.mgf" should appear in the working directory along with its metadata table (txt files)

# Now we check in the newly added scans whether the desired precursor mz is in:

tmp_library = library2$current
query = library_manager(tmp_library, query = c("PEPMASS = 478.096"), ppm_search = 20)
library_visualizer(query)

daniellyz/MergeION documentation built on Oct. 19, 2022, 1:56 p.m.