msms_spectra_hmdb: Import MS/MS spectra from HMDB xml files

View source: R/spectrum-import-functions.R

msms_spectra_hmdbR Documentation

Import MS/MS spectra from HMDB xml files


msms_spectra_hmdb imports MS/MS spectra from corresponding xml files from HMDB ( and returns the data as a data.frame. HMDB stores MS/MS spectrum data in xml files, one file per spectrum.

Depending on the parameter collapsed, the returned data.frame is either collapsed, meaning that each row represents data from one spectrum xml file, or expanded with one row for each m/z and intensity pair for each spectrum. Columns "mz" and "intensity" are of type list for collapsed = TRUE and numeric for collapsed = FALSE.


msms_spectra_hmdb(x, collapsed = TRUE)



character(1): with the path to directory containing the xml files.


logical(1) whether the returned data.frame should be collapsed or expanded. See description for more details.


data.frame with as many rows as there are peaks and columns:

  • spectrum_id (integer): an arbitrary, unique ID identifying values from one xml file.

  • original_spectrum_id (character): the HMDB-internal ID of the spectrum.

  • compound_id (character): the HMDB compound ID the spectrum is associated with.

  • polarity (integer): 0 for negative, 1 for positive, NA for not set.

  • collision_energy (numeric): collision energy voltage.

  • predicted (logical): whether the spectrum is predicted or experimentally verified.

  • splash (character): the SPLASH (SPectraL hASH) key of the spectrum (Wohlgemuth 2016).

  • instrument_type (character): the type of MS instrument on which the spectrum was measured.

  • instrument (character): the MS instrument (not available for all spectra in HMDB).

  • precursor_mz (numeric): not provided by HMDB and thus NA.

  • mz (numeric or list of numeric): m/z values of the spectrum.

  • intensity (numeric or list of numeric): intensity of the spectrum.


The HMDB xml files are supposed to be extracted from the downloaded zip file into a folder and should not be renamed. The function identifies xml files containing MS/MS spectra by their file name.

The same spectrum ID can be associated with multiple compounds. Thus, the function assignes an arbitrary ID (column "spectrum_id") to values from each file. The original ID of the spectrum in HMDB is provided in column "original_spectrum_id".


Johannes Rainer


Wohlgemuth G, Mehta SS, Mejia RF, Neumann S, Pedrosa D, Pluskal T, Schymanski EL, Willighagen EL, Wilson M, Wishart DS, Arita M, Dorrestein PC, Bandeira N, Wang M, Schulze T, Selak RM, Steinbeck C, Nainala VC, Mistrik R, Nishioka T, Fiehn O. SPLASH, A hashed identifier for mass spectra. Nature Biotechnology 2016 34(11):1099-1101

See Also

createCompDb() for the function to create a CompDb database with compound annotation and spectrum data.

Other spectrum data import functions.: msms_spectra_mona()


## Locate the folder within the package containing test xml files.
pth <- system.file("xml", package = "CompoundDb")

## List all files in that directory

## Import spectrum data from HMDB MS/MS spectrum xml files in that directory

## Import the data as an *expanded* data frame, i.e. with a row for each
## single m/z (intensity) value.
msms_spectra_hmdb(pth, collapsed = FALSE)

EuracBiomedicalResearch/CompoundDb documentation built on March 17, 2023, 3:47 p.m.