View source: R/mslibrary-msp.R
loadMSLibraryMSP | R Documentation |
This function loads, verifies and curates MS library data from MSP files.
loadMSLibraryMSP(
file,
parseComments = TRUE,
prefCalcChemProps = TRUE,
neutralChemProps = FALSE,
potAdducts = TRUE,
potAdductsLib = TRUE,
absMzDev = 0.002,
calcSPLASH = TRUE
)
file |
A |
parseComments |
If |
prefCalcChemProps |
If |
neutralChemProps |
If |
potAdducts , potAdductsLib |
If and how missing adducts (
|
absMzDev |
The maximum absolute m/z deviation when guessing missing adducts. |
calcSPLASH |
If set to |
This function uses an efficient C++
MSP loader to load MS library data. This function is called when calling loadMSLibrary
with
algorithm="msp"
.
This function uses C++
with Rcpp to efficiently load and parse MSP files, and is mainly
optimized for loading the ‘.msp’ files from MassBank EU and
MoNA. Files from other sources may also work, any feedback on this is
welcome!
The loaded data is returned in an MSLibrary
object.
Several strategies are applied to automatically verify and improve
library data. This is important, since library records may have inconsistent or erroneous data, which makes them
unsuitable in automated workflows such as compounds annotation with generateCompoundsLibrary
.
The loaded library data is post-treated as follows:
The DB#
field is renamed to DB_ID
to improve compatibility with R column names.
Synonyms (Synon
fields) are merged together, mainly to save memory usage.
Inconsistently formatted NA
data (e.g. "n/a"
, "N/A"
or empty strings) are set to
regular R NA
values.
The case of record field names are made consistent.
The Formula
and ExactMass
fields are renamed to formula
and neutralMass
,
respectively. This is for consistency with other data generated with patRoon.
character
field data is trimmed from leading/trailing whitespace.
Mass data is verified to be properly numeric, and set to NA
otherwise.
The format of formulae data is made consistent: ionic species (with or without square brackets) or converted to a regular formula format.
Chemical identifiers such as SMILES and formulae are verified and missing values are calculated if possible. See below for more details.
Shortened data in the Ion_mode
field (P/N) is converted to the long format
(POSITIVE
/NEGATIVE
).
Many different adduct flavors typically found as Precursor_type
data are converted and normalized to
the generic textual format used by patRoon (see as.adduct
).
If potAdducts!=FALSE
then missing or invalid adduct data in Precursor_type
is guessed based on
the difference between the neutral and ionic mass. If multiple adducts explain the mass difference the result is
NA
.
Missing ion m/z data (PrecursorMZ
field) is calculated from adduct data, if possible.
Missing SPLASH data is calculated with the splashR package
if calcSPLASH=TRUE
.
Chemical properties such as SMILES, InChIKey and formula in the MS library are automatically validated and calculated if missing/invalid.
The internal validation/calculation process performs the following steps:
Validation of SMILES, InChI, InChIKey and formula data (if present). Invalid
entries will be set to NA
.
If neutralChemProps=TRUE
then chemical data (SMILES, formulae etc.) is neutralized by
(de-)protonation (using the --neutralized
option of OpenBabel
). An additional column
molNeutralized
is added to mark those molecules that were neutralized. Note that neutralization requires
either SMILES or InChI data to be available.
The SMILES and InChI data are used to calculate missing or invalid SMILES,
InChI, InChIKey and formula data. If prefCalcChemProps=TRUE
then existing
InChIKey and formula data is overwritten by calculated values whenever possible.
The chemical formulae which were not calculated are verified and normalized. This process may be time
consuming, and is potentially largely avoided by setting prefCalcChemProps=TRUE
.
Neutral masses are calculated for missing values (prefCalcChemProps=FALSE
) or whenever possible
(prefCalcChemProps=TRUE
).
Note that calculation of formulae for molecules that are isotopically labelled is currently only supported for deuterium (2H) elements.
This functionality relies heavily on OpenBabel, please make sure it is installed.
The mass spectrum parser currently only supports space separated entries (MSP formerly also allows other formats).
Guessing adducts from neutral/ionic mass differences was inspired from MetFrag.
Wohlgemuth2016patRoon
\insertRefRuttkies2016patRoon
\addCitationsRcpp1
\addCitationsRcpp2
\addCitationsRcpp3
OBoyle2011patRoon
loadMSLibrary
for more details and other algorithms.
The MSLibrary
documentation for various methods to post-process the data and
generateCompoundsLibrary
for annotation of features with the library data.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.