knitr::opts_chunk$set(echo = TRUE)
masstrixR
performs annotation of MS^1^ data with putative metabolites. In order to perfrom this annotation a fitting database of metabolites is required. masstrixR
uses the SQLite database framework for fast and efficient searching. Measured masses are compared with theoretical adduct masses of metabolites of interest withing a certain mass error, either defined as absolute error in Da or relative error in ppm.
Databases for masstrixR
are generated from compound lists and adduct definitions. masstrixR
contains calculation rules for several possible adducts. Adducts are referenced by their normal nomenclature, e.g. [M+H]+.
The adduct calculation rules are defined in a list and contain an additive and multiplicative part, as well as subtractive and additive part for the generation of ion sum formulas. The function getAdductCalc()
returns a list with this calculation rules.
library(masstrixR) adductCalc <- getAdductCalc() adductCalc["[M+H]+"]
A list with all adduct names is returned by the function getAdductNames
or separate for each ionization mode with getAllPosModeAdducts
or getAllNegModeAdducts
# all adducts getAdductNames() # all positive mode adducts getAllPosModeAdducts() # all negative mode adducts getAllNegModeAdducts()
The supplied values can be used to calculate adduct masses of molecules. The example below shows how to calculate m/z values for different adducts.
# exact mass for coenzyme A exactMass <- 767.115210 # calculate m/z of [M+H]+ adduct exactMass * as.numeric(adductCalc[["[M+H]+"]][1]) + as.numeric(adductCalc[["[M+H]+"]][2]) # calculate m/z of [M+Na]+ adduct exactMass * as.numeric(adductCalc[["[M+Na]+"]][1]) + as.numeric(adductCalc[["[M+Na]+"]][2]) # calculate m/z of [M+2H]2+ adduct exactMass * as.numeric(adductCalc[["[M+2H]2+"]][1]) + as.numeric(adductCalc[["[M+2H]2+"]][2]) # calculate m/z of [M-H]- adduct exactMass * as.numeric(adductCalc[["[M-H]-"]][1]) + as.numeric(adductCalc[["[M-H]-"]][2])
In parallel to adduct masses also the adduct or ion formula can be generated. This formula is useful for generation of isotopic patterns. The function calcAdductFormula
directly accepts a chemical formula and a adduct name.
# chemical formula of coenzyme A chemFormula <- "C21H36N7O16P3S" # generate formula of [M+H]+ adduct calcAdductFormula(chemFormula, "[M+H]+") # generate formula of [M+Na]+ adduct calcAdductFormula(chemFormula, "[M+Na]+") # generate formula of [M+2H]2+ adduct calcAdductFormula(chemFormula, "[M+2H]2+")
Based on the functions for calculation of adduct masses masstrixR
can generate complete SQLite databases that can be used for the annotation workflow. If the user supplies a defined input different functions allow the generation of a SQLite database in the format fitting for the use with masstrixR
. Metabolites for database generation can be read from Excel, clipboard or a text file. The data requires the followin headers:
The column id
shall contain a unique identifier. The fields smiles
, inchi
and inchikey
are required as columns, but are not further used at the moment. formula
has to contain a valid chemical formula for the respective metabolite, while name
contains the name and exactmass
the exact mass with minimum 4 digits after the comma. If no exact mass is supplied, it can be calculated based on the formula.
The example below shows how to load data and to prepare a SQLite database for masstrixR
from it. A example .txt file is loaded from the installation of masstrixR
, but any data frame from any source with the same formatting works.
# load required library library(readr) # get example file from package exampleFile <- system.file("extdata", "exampleData\\databases\\ymdb_example.txt", package = 'masstrixR') # read file into tibble compoundList <- read_tsv(exampleFile, col_types = cols(exactmass = col_double())) head(compoundList)
Next the adducts that have to be covered in the database need to be defined. Since no furhter matching for adducts is performed in the database search in the later steps of the annotation workflow it is advisable to generate databases only for a single ionization mode. Also to improve performance only the adducts really needed should be defined. In the example below [M+H]+ and [M+Na]+ adducts are choosen.
# adducts used for DB generation adducts <- c("[M+H]+", "[M+Na]+") # create compound list for DB creation newCompoundList <- prepareCompoundList(compoundList, adductList = adducts)
Based on a valid compound list and selected adducts the prepareCompoundList
function generates a new compound list that can be used with masstrixR
. This list now contains several additional columns:
Many of these columns are required for more advanced workflows, e.g. combined m/z and RT search. They are explained in the respective vignettes. Since the new compound list represents a simple data frame it can be generated with any other software, e.g. in Excel, and then read to R. To check if the supplied list is valid the validateCompoundList
is used. This function returns TRUE
or FALSE
. In the last step a SQLite database file is generated with the createDb
function, which returns the file name of the generated database. The file is stored to the current working directory. SQLite files are portable and can be shared between users. The generated .sqlite file only has to be generated once and can be reused any time.
# check if compound list is valid and create SQLite DB if(validateCompoundList(newCompoundList)) { dbFileName <- createDb(newCompoundList, "example_pos_MH_MNa") } print(dbFileName)
Congratulations! You after running the code of this vignette you generate your first .sqlite database file that can be used with masstrixR
. You can proceed with the vignette on m/z matching, which explains how a .sqlite file can be used with masstrixR
to annotate MS^1^. data.
If working with isotopically labeled substances, e.g. in isotope tracer or labeling experiments, masses show characteristics shifts. masstrixR
offers the possibility to calculate masses (not abundance) of isotopically labeled substances by using the chemical formula as basis. An input compound list is used and modified to contain the isotopically labeled exact masses. This list can be further processed like a normal compound list.
# get all supported elements for labeling getIsotopeNameList() # create fully labeled metabolite masses isoCompoundList_full <- prepareIsoCompoundList(compoundList, isoLabel = "full", labeledElement = "C") # create partially labeled metabolite masses with 1 to 3 labeled carbons isoCompoundList_partial <- prepareIsoCompoundList(compoundList, isoLabel = "partial", labeledElement = "C", noOfLabel = c(0, 1, 2, 3)) # get examples isoCompoundList_partial[which(isoCompoundList_partial$id == "YMDB00002"),] # adducts used for DB generation adducts <- c("[M+H]+", "[M+Na]+") # create compound list for DB creation newisoCompoundList_partial <- prepareCompoundList(isoCompoundList_partial, adductList = adducts) # check if compound list is valid and create SQLite DB if(validateCompoundList(newisoCompoundList_partial)) { dbFileName <- createDb(newisoCompoundList_partial, "example_isoLabel_pos_MH_MNa") } print(dbFileName)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.