BiocStyle::markdown()

Package: r Biocpkg("MsBackendHmdb")
Authors: r packageDescription("MsBackendHmdb")[["Author"]]
Last modified: r file.info("MsBackendHmdb.Rmd")$mtime
Compiled: r date()

library(Spectra)
library(BiocStyle)

Introduction

The Spectra package provides a central infrastructure for the handling of Mass Spectrometry (MS) data. The package supports interchangeable use of different backends to import MS data from a variety of sources (such as mzML files). The MsBackendHmdb package enables, with the MsBackendHmdbXml object, import of MS/MS spectrum data from xml files from The Human Metabolome Database HMDB. This vignette illustrates the usage of the MsBackendHmdb package to enable HMDB data usage in Spectra.

Importing MS/MS data from HMDB xml files

Spectral data from HMDB can be downloaded in xml format, one xml file per spectrum. In our short example we load 4 such xml files which are provided with this package. Below we first load all required packages and define the file names of the MS/MS spectra xml files.

library(Spectra)
library(MsBackendHmdb)

fls <- dir(system.file("xml", package = "MsBackendHmdb"),
           full.names = TRUE, pattern = "xml$")

MS data can be accessed and analyzed through Spectra objects. Below we create a Spectra with the data from the above xml files. To this end we provide the file names and specify to use a MsBackendHmdbXml() backend as source to enable data import and MsBackendDataFrame() as backend to store/handle the data.

sps <- Spectra(fls, source = MsBackendHmdbXml(),
               backend = MsBackendDataFrame())

With that we have now full access to all imported spectra variables that we list below.

spectraVariables(sps)

Besides default spectra variables, such as msLevel, rtime, precursorMz (most of which are however not defined in the xml files from HMDB) we have also additional spectra variables such as the spectrum_id (the ID of the spectrum in the HMDB database), compound_id (the metabolite identifier), splash or instrument_type. Below we list the instrument type for the 4 spectra.

sps$instrument_type

The last spectrum was predicted and the instrument type is thus set to NA.

In addition we can also access the m/z and intensity values of each spectrum.

mz(sps)
intensity(sps)

Importing all MS/MS spectra from HMDB

It is also possible to import all MS/MS spectra from HMDB into a Spectra object. For this we have to firstly download all xml files from the downloads page of HMDB. HMDB allows to download all files in a single archive, unzipping that will result in a folder with a very large number of small files, one file per spectrum. Note also that this folder will contain also NMR and other types of spectra. The variable path below is supposed to point to the folder where all xml files can be found. We first list all xml files in that folder using the pattern "ms_ms_spectrum" to get only file names for MS/MS spectra.

path <- "~/data/hmdb_all_spectra/"
fls <- dir(path, pattern = "ms_ms_spectrum", full.names = TRUE)

With that we can now import the data and create a Spectra object representing the collection of all HMDB MS2 spectra. Setting nonStop = TRUE prevents the call to stop whenever it encounters problematic xml files (like xml files without peaks). Note that the import of about 400,000 MS/MS spectra can take a long time (in the range of one to several hours).

sps_hmdb <- Spectra(fls, source = MsBackendHmdbXml(), nonStop = TRUE,
                    backend = MsBackendDataFrame())

Session information

sessionInfo()


rformassspectrometry/MsBackendHmdb documentation built on May 15, 2022, 4:30 a.m.