In michaelwitting/masstrixR: Annotation of MS based metabolomics data

knitr::opts_chunk$set(echo = TRUE)

Introduction

masstrixR performs annotation of MS^1^ data with putative metabolites. In order to perfrom this annotation a fitting database of metabolites is required. masstrixR uses the SQLite database framework for fast and efficient searching. Measured masses are compared with theoretical adduct masses of metabolites of interest withing a certain mass error, either defined as absolute error in Da or relative error in ppm. Databases for masstrixR are generated from compound lists and adduct definitions. masstrixR contains calculation rules for several possible adducts. Adducts are referenced by their normal nomenclature, e.g. [M+H]+.

Working with adducts

The adduct calculation rules are defined in a list and contain an additive and multiplicative part, as well as subtractive and additive part for the generation of ion sum formulas. The function getAdductCalc() returns a list with this calculation rules.

library(masstrixR)

adductCalc <- getAdductCalc()
adductCalc["[M+H]+"]

A list with all adduct names is returned by the function getAdductNames or separate for each ionization mode with getAllPosModeAdducts or getAllNegModeAdducts

# all adducts
getAdductNames()

# all positive mode adducts
getAllPosModeAdducts()

# all negative mode adducts
getAllNegModeAdducts()

The supplied values can be used to calculate adduct masses of molecules. The example below shows how to calculate m/z values for different adducts.

# exact mass for coenzyme A
exactMass <- 767.115210

# calculate m/z of [M+H]+ adduct
exactMass * as.numeric(adductCalc[["[M+H]+"]][1]) + as.numeric(adductCalc[["[M+H]+"]][2])

# calculate m/z of [M+Na]+ adduct
exactMass * as.numeric(adductCalc[["[M+Na]+"]][1]) + as.numeric(adductCalc[["[M+Na]+"]][2])

# calculate m/z of [M+2H]2+ adduct
exactMass * as.numeric(adductCalc[["[M+2H]2+"]][1]) + as.numeric(adductCalc[["[M+2H]2+"]][2])

# calculate m/z of [M-H]- adduct
exactMass * as.numeric(adductCalc[["[M-H]-"]][1]) + as.numeric(adductCalc[["[M-H]-"]][2])

In parallel to adduct masses also the adduct or ion formula can be generated. This formula is useful for generation of isotopic patterns. The function calcAdductFormula directly accepts a chemical formula and a adduct name.

# chemical formula of coenzyme A
chemFormula <- "C21H36N7O16P3S"

# generate formula of [M+H]+ adduct
calcAdductFormula(chemFormula, "[M+H]+")

# generate formula of [M+Na]+ adduct
calcAdductFormula(chemFormula, "[M+Na]+")

# generate formula of [M+2H]2+ adduct
calcAdductFormula(chemFormula, "[M+2H]2+")

Generating adduct databases

Based on the functions for calculation of adduct masses masstrixR can generate complete SQLite databases that can be used for the annotation workflow. If the user supplies a defined input different functions allow the generation of a SQLite database in the format fitting for the use with masstrixR. Metabolites for database generation can be read from Excel, clipboard or a text file. The data requires the followin headers:

id
smiles
inchi
inchikey
formula
name
exactmass

The column id shall contain a unique identifier. The fields smiles, inchi and inchikey are required as columns, but are not further used at the moment. formula has to contain a valid chemical formula for the respective metabolite, while name contains the name and exactmass the exact mass with minimum 4 digits after the comma. If no exact mass is supplied, it can be calculated based on the formula.

The example below shows how to load data and to prepare a SQLite database for masstrixR from it. A example .txt file is loaded from the installation of masstrixR, but any data frame from any source with the same formatting works.

# load required library
library(readr)

# get example file from package
exampleFile <- system.file("extdata", "exampleData\\databases\\ymdb_example.txt", package = 'masstrixR')

# read file into tibble
compoundList <- read_tsv(exampleFile, col_types = cols(exactmass = col_double()))
head(compoundList)

Next the adducts that have to be covered in the database need to be defined. Since no furhter matching for adducts is performed in the database search in the later steps of the annotation workflow it is advisable to generate databases only for a single ionization mode. Also to improve performance only the adducts really needed should be defined. In the example below [M+H]+ and [M+Na]+ adducts are choosen.

# adducts used for DB generation
adducts <- c("[M+H]+", "[M+Na]+")

# create compound list for DB creation
newCompoundList <- prepareCompoundList(compoundList, adductList = adducts)

Based on a valid compound list and selected adducts the prepareCompoundList function generates a new compound list that can be used with masstrixR. This list now contains several additional columns:

metaboliteID
adductType
adductMass
neutralMass
neutralFormula
ionFormula
metaboliteName
inchikey
inchi
smiles
rt
ccs
kegg
hmdb
chebi

Many of these columns are required for more advanced workflows, e.g. combined m/z and RT search. They are explained in the respective vignettes. Since the new compound list represents a simple data frame it can be generated with any other software, e.g. in Excel, and then read to R. To check if the supplied list is valid the validateCompoundList is used. This function returns TRUE or FALSE. In the last step a SQLite database file is generated with the createDb function, which returns the file name of the generated database. The file is stored to the current working directory. SQLite files are portable and can be shared between users. The generated .sqlite file only has to be generated once and can be reused any time.

# check if compound list is valid and create SQLite DB
if(validateCompoundList(newCompoundList)) {
  dbFileName <- createDb(newCompoundList, "example_pos_MH_MNa")
}
print(dbFileName)

Congratulations! You after running the code of this vignette you generate your first .sqlite database file that can be used with masstrixR. You can proceed with the vignette on m/z matching, which explains how a .sqlite file can be used with masstrixR to annotate MS^1^. data.

Creating database files with isotopically labeled substances

If working with isotopically labeled substances, e.g. in isotope tracer or labeling experiments, masses show characteristics shifts. masstrixR offers the possibility to calculate masses (not abundance) of isotopically labeled substances by using the chemical formula as basis. An input compound list is used and modified to contain the isotopically labeled exact masses. This list can be further processed like a normal compound list.

# get all supported elements for labeling
getIsotopeNameList()

# create fully labeled metabolite masses
isoCompoundList_full <- prepareIsoCompoundList(compoundList, isoLabel = "full", labeledElement = "C")

# create partially labeled metabolite masses with 1 to 3 labeled carbons
isoCompoundList_partial <- prepareIsoCompoundList(compoundList, isoLabel = "partial", labeledElement = "C", noOfLabel = c(0, 1, 2, 3))

# get examples
isoCompoundList_partial[which(isoCompoundList_partial$id == "YMDB00002"),]

# adducts used for DB generation
adducts <- c("[M+H]+", "[M+Na]+")

# create compound list for DB creation
newisoCompoundList_partial <- prepareCompoundList(isoCompoundList_partial, adductList = adducts)

# check if compound list is valid and create SQLite DB
if(validateCompoundList(newisoCompoundList_partial)) {
  dbFileName <- createDb(newisoCompoundList_partial, "example_isoLabel_pos_MH_MNa")
}
print(dbFileName)