In michaelwitting/masstrixR: Annotation of MS based metabolomics data

knitr::opts_chunk$set(echo = TRUE)

Introdcution

MassTRIX was build for the annotation of m/z values from high resolution mass spectrometry with putative metabolites. Different versions of MassTRIX have been published as online server [1-3]. The new version of MassTRIX is implemented as R package, called masstrixR. The main purpose is to provide annotation to m/z features and allow analysis of MS^2^ for metabolite identification.

This vignette describes the basic workflow for annotation of m/z values obtained from MS^1^. The input can be derived from direct infusion or chromatographic experiments. masstrixR uses SQLite database files to store metabolite databases and calculated adducts. The generation of such a database is explained in the vignette "Create SQLiteDBs for masstrixR".

Loading a database

For the following example we load an example database file installed with masstrixR. This database contains precalculated adducts ([M+H]+ and [M+Na]+) for a selection of metabolites. The database file is simply loaded via its file name and only opened on demand.

dbFileName <- system.file("extdata", "exampleData\\databases\\example_pos_MH_MNa.sqlite", package = 'masstrixR')

Reading m/z data

Data that shall be annotated has to be supplied as data frame with column headers. The column containing the m/z values is identified by its name. Currently supported header names are:

mz ($mz)
m.z ($m.z)
mzmed ($mzmed)

In this example a .gda file from Genedata Expressionist for MS is loaded using the readGdaFile function, but any other data frame contain the m/z values will work. In future more distinct reader functions will be implemented, e.g. for Agilent .cef files etc. The function readGdaFile returns a list with the acutal intensity values, the annotation of the samples and the annotation of the features, which we require for m/z annotation. This data frame is the third in the list.

# load masstrixR
library(masstrixR)

# read example .gda file
gdaFile <- system.file("extdata", "exampleData\\Celegans_mz\\NaAcHILICPos_Cluster.gda", package = 'masstrixR')
exampleGDA <- readGdaFile(gdaFile)

# get row annotations with m/z values
rowAnno <- exampleGDA[[3]]
rowAnno$ClusterName <- row.names(rowAnno)
head(rowAnno)

Perform m/z annotation

After all the data is available, the m/z annotation can be performed. This is done with the function mzSearch. It requires some arguments. First, the data frame for which the annotation shall be performed (in this case rowAnno), the path to the .sqlite file containing the database (dbFileName) and the tolerance for the mass search. The tolerance can be given either as absolute value ("abs") in Da or as relative error ("ppm") in ppm. The argument mzTol defines the maximum tolerance. The function mzSearch returns a data frame with all features that have been annotated. It contains all the original columns plus all columns from the database.

#annotate
annotationResults <- mzSearch(rowAnno, dbFileName, mzTol = 0.005, mzTolType = "abs")

head(annotationResults)

If large database shall be used for annotation the database file can be first read into the memory for improved performance. This is done by adding the argument mode. The default value for this argument is "onDisk". The value "inMemory" creates a copy of the database in memory, which is used for annotation.

#annotate with DB in memory
annotationResults <- mzSearch(rowAnno, dbFileName, mode = "inMemory", mzTol = 0.005, mzTolType = "abs")

The function mzSearch contains several other options for more advanced workflows, which include RT and CCS matching. They are explained in the respective vignettes.