LibrarySearch: Perform the library search within R

View source: R/library_search_r.R

LibrarySearchR Documentation

Perform the library search within R

Description

Perform library search using a custom implementation of the Identity (EI Normal) or Similarity (EI Simple) algorithm. Pairwise comparison of two mass spectra is implemented in C.

Usage

LibrarySearch(
  msp_objs_u,
  msp_objs_l,
  algorithm = c("identity_normal", "similarity_simple"),
  search_type = c("standard", "reverse"),
  n_hits = 100L,
  hitlist_columns = c("formula", "mw", "smiles"),
  mz_min = NULL,
  mz_max = NULL,
  comments = NULL
)

Arguments

msp_objs_u, msp_objs_l

A list of nested lists. Each nested list is a mass spectrum. Each nested list must contain at least three elements: (1) name (a string) - compound name (or short description); (2) mz (a numeric/integer vector) - m/z values of mass spectral peaks; (3) intst (a numeric/integer vector) - intensities of mass spectral peaks. Letters 'u' and 'l' stand for unknown and library respectively). Mass spectra should be pre-processed using the PreprocessMassSpectra function.

algorithm

A string. Library search algorithm. Either the Identity EI Normal (identity_normal) or Similarity EI Simple (similarity_simple) algorithm.

search_type

A string. Library search type: standard search (standard) or reverse search (reverse). During the standard search all peaks presented in either library or unknown spectrum are taken into account. During the reverse search all peaks that are absent in the library spectrum are ignored.

n_hits

An integer value. The maximum number of hits (i.e., candidates) to display.

hitlist_columns

A character vector. Three columns are always present in the returned hitlist: name, mf or rmf (i.e., the match factor or the reverse match factor), and idx (i.e., the index of the respective library mass spectrum in the msp_objs_l list). Some additional columns can be added using the hitlist_columns argument (e.g., cas_no, formula, inchikey, etc.). Only scalar values (i.e., an atomic vector of unit length) are allowed.

mz_min, mz_max

An integer value. Boundaries of the m/z range (all m/z values out of this range are not taken into account when the match factor is calculated).

comments

Any R object. Some additional information. It is saved as the 'comments' attribute of the returned list.

Value

Return a list of data frames. Each data frame is a hitlist (i.e., list of possible candidates). Each hitlist always contains three columns: name, mf or rmf (i.e., the match factor or the reverse match factor), and idx (i.e., the index of the respective library mass spectrum in the msp_objs_l list). Additional columns can be extracted using the hitlist_columns argument. Library search options are saved as the library_search_options attribute.

Examples

# Reading the 'alkanes.msp' file
msp_file <- system.file("extdata", "alkanes.msp", package = "mssearchr")

# Pre-processing
msp_objs_u <- PreprocessMassSpectra(ReadMsp(msp_file)) # unknown mass spectra
msp_objs_l <- PreprocessMassSpectra(massbank_alkanes)  # library mass spectra

# Searching using the Identity algorithm
hitlists <- LibrarySearch(msp_objs_u, msp_objs_l,
                          algorithm = "identity_normal", n_hits = 10L,
                          hitlist_columns = c("formula", "smiles", "db_no"))

# Printing a hitlist for the first compound from the 'alkanes.msp' file
print(hitlists[[1]][1:5, ])

#>        name       mf idx formula        smiles                db_no
#> 1  UNDECANE 950.5551  11  C11H24   CCCCCCCCCCC MSBNK-{...}-JP006877
#> 2  UNDECANE 928.4884  72  C11H24   CCCCCCCCCCC MSBNK-{...}-JP005760
#> 3  DODECANE 905.7546  74  C12H26  CCCCCCCCCCCC MSBNK-{...}-JP006878
#> 4 TRIDECANE 891.7862  41  C13H28 CCCCCCCCCCCCC MSBNK-{...}-JP006879
#> 5  DODECANE 885.6247  42  C12H26  CCCCCCCCCCCC MSBNK-{...}-JP005756


mssearchr documentation built on April 3, 2025, 8:28 p.m.