ReadMsp: Read mass spectra from an msp-file (NIST format)

View source: R/msp_format.R

ReadMspR Documentation

Read mass spectra from an msp-file (NIST format)

Description

Read an msp-file containing mass spectra in the NIST format. The complete description of the format can be found in the NIST Mass Spectral Search Program manual. A summary is presented below in the "Description of the NIST format" section.

Usage

ReadMsp(input_file)

Arguments

input_file

A string. The name of a file.

Details

Data from an msp-file are read without any modification (e.g., the order of mass values is not changed, zero-intensity peaks are preserved, etc.).

Value

Return a list of nested lists. Each nested list is a mass spectrum. Almost all metadata fields (e.g., "Name", "CAS#", "Formula", "MW", etc.) are represented as strings. All "Synon" fields are merged into a single character vector. Mass values and intensities are represented as numeric vectors (mz and intst). Names of fields are slightly modified:

  • names are converted to lowercase;

  • hash symbols are replaced with _no;

  • any other special character is replaced with an underscore character.

Description of the NIST format

The summary was prepared using the NIST Mass Spectral Search Program manual v.2.4 (2020).

  • An msp-file can contain as many spectra as wanted.

  • Each spectrum must start with the "Name" field. There must be something in this field.

  • The "Num Peaks" field is also required. It must contain the number of mass/intensity pairs.

  • Some optional fields (e.g. "Comments", "Formula", "MW") can be between the "Name" and "Num Peaks" fields.

  • When a spectrum is exported from the NIST library it also contains the "NIST#" and "DB#" fields. The "NIST#" field is on the same line as the "CAS#" field and separated by a semicolon.

  • Each field should be on a separate line (the "NIST#" field is an exception from this rule)

  • The mass/intensity list begins on the line following the "Num Peaks" field. The peaks need not be normalized, and the masses need not be ordered. The exact spacing and delimiters used for the mass/intensity pairs are unimportant. The following characters are accepted as delimiters: 'space', 'tab', ',', ';', ':'. Parentheses, square brackets and curly braces ('(', '(', '[', ']', '{', and '}') are also allowed.

  • The "Name" field can be up to 511 characters.

  • The "Comments" field can be up to 1023 characters.

  • The "Formula" field can be up to 23 characters.

  • The "Synon" field may be repeated.

Examples

# Reading the 'alkanes.msp' file
msp_file <- system.file("extdata", "alkanes.msp", package = "mssearchr")
msp_objs <- ReadMsp(msp_file)

# Plotting the first mass spectrum from the 'msp_objs' list
par_old <- par(yaxs = "i")
plot(msp_objs[[1]]$mz, msp_objs[[1]]$intst,
     ylim = c(0, 1000), main = msp_objs[[1]]$name,
     type = "h", xlab = "m/z", ylab = "Intensity", bty = "l")
par(par_old)


mssearchr documentation built on April 3, 2025, 8:28 p.m.