pattern.search2: Detecting and grouping isotope m/z relations among LC-HRMS...

View source: R/pattern.search2.R

pattern.search2R Documentation

Detecting and grouping isotope m/z relations among LC-HRMS centroid peaks, based on quantized reference data

Description

Algorithm for grouping isotope pattern centroids of chemical components by querying quantized simulation data

Usage

pattern.search2(peaklist,quantiz,mztol=2,ppm=TRUE,inttol=0.5,rttol=0.3,
use_isotopes=c("13C","37Cl","15N","81Br","34S","18O"),use_charges=c(1,2),
use_marker=TRUE,quick=FALSE,isotopes, get_pairs = FALSE, get_matches = FALSE)

Arguments

peaklist

Dataframe of HRMS peaks with three numeric columns for (a) m/z, (b) intensity and (c) retention time, such as peaklist.

quantiz

Quantized instrument-specific (!) simulation data of feasible centroid-centroid relations as provided by package nontargetData.

mztol

m/z tolerance setting: value by which the m/z of a peak may vary from its expected value. If parameter ppm=TRUE (see below) given in ppm, otherwise, if ppm=FALSE, in absolute m/z [u].

ppm

Should mztol be set in ppm (TRUE) or in absolute m/z (FALSE).

inttol

Intensity tolerance setting = fraction by which peak intensities may vary; e.g., if set to 0.2, a peak with expected intensity 10000 may range in between 8000 and 12000.

rttol

+/- retention time tolerance. Units as given in column 3 of peaklist argument, e.g. [min].

use_isotopes

Restrict query to certain isotopes dominating centroid relations; set to FALSE to use all available isotopes.

use_charges

Vector of signed integers. Restrict query to certain charges z; set to FALSE to use all charge states.

use_marker

Query for marker peaks, FALSE or TRUE?

quick

Continue if query finds first hit? Speeds up, but leaves resulting information on underlying isotopes incomplete.

isotopes

Dataframe of relevant isotopes as provided by package enviPat; used for checking user inputs.

get_pairs

enviMass output, please ignore.

get_matches

enviMass output, please ignore.

Details

As alternative to rule-based pattern.search, differences among measured centroids (peaklist) are queried to match those of compressed (=quantized) simulation data within bounds of measurement tolerances and the quantization distortion. Hence, in comparion to pattern.search, this approach accounts for centroid mass shifts induced by peak profile interferences prevalent at even high m/z resolution.

To derive the quantized data, isotope pattern centroids of several million organic molecular formulas from the PubChem database were calculated for various classes of adducts. Molecular formulas were filtered to be unique and only to contain C, H, O, N, Cl, Br, K, Na, S, Si, F, P and/or I. The resulting >250 million centroid pairs from individual patterns were then categorized for their dominant isotopologues, charge and the possible presence of another centroid of higher intensity than that of the pair (=marker peak). Within these categories, data on centroid pair (a) m/z, (b) m/z differences, (c) intensity ratios and (d) marker m/z was quantized by a recursive partitioning procedure. The resulting compressed data representation was extended by nearest neigbour estimates in the above dimensions (a) to (d) to account for queries with molecular formulas possibly not present in the PubChem set. Internally, the quantized simulation data is queried by a tree-like space-partitioning structure for hyperrectangles, while centroids from peaklist are restructured into kd-trees.

Value

List of type pattern with 12 entries

pattern[[1]]

Patterns. Dataframe with peaks (mass,intensity,rt,peak ID) and their isotope pattern relations (to ID,isotope(s),mass tolerance (deprecated),charge level) within isotope pattern groups (group ID,interaction level (deprecated)).

pattern[[2]]

Parameters. Parameters used.

pattern[[3]]

Peaks in pattern groups. Dataframe listing all peaks (peak IDs) per isotope pattern group (group ID) at the given z-level(s) (charge level).

pattern[[4]]

Atom counts. Deprecated.

pattern[[5]]

Count of pattern groups. Number of isotope pattern groups found on the different z-levels used.

pattern[[6]]

Removals by rules. Deprecated.

pattern[[7]]

Number of peaks with pattern group overlapping. Deprecated

pattern[[8]]

Number of peaks per within-group interaction levels.

pattern[[9]]

Counts of isotopes. Number of times a m/z isotope difference was detected (raw measure / number of isotope pattern groups)

pattern[[10]]

Elements. Elements used via argument iso derived by make.isos.

pattern[[11]]

Charges. z-levels used.

pattern[[12]]

Rule settings. Deprecated.

Warning

Acceptable outcomes strongly depend on appropriate parametrization of the algorithm and using the correct quantiz data set from package nontargetData. Using overly large values for rttol and/or mztol may lead to slow execution.

Note

Peak IDs refer to the order in which peaks are provided.

If you do not find quantized simulation data for your instrument in package nontargetData and you can provide resolution=f(m/z) information: contact maintainer.

Author(s)

Martin Loos

See Also

rm.sat peaklist plotisotopes plotdefect combine plotgroup pattern.search

Examples


######################################################
# load HRMS centroid list: ###########################
data(peaklist)
# load isotope data ##################################
data(isotopes)
# load quantized simulation data #####################
data(OrbitrapXL_VelosPro_R60000at400_q)
######################################################
# run isotope pattern grouping #######################
# save the list returned as "pattern" ################
pattern<-pattern.search2(
	peaklist,
	OrbitrapXL_VelosPro_R60000at400_q,
	mztol=2, 
	ppm=TRUE,
	inttol=0.5,
	rttol=0.3,
	use_isotopes=FALSE,
	use_charges=FALSE,
	use_marker=TRUE,
	quick=FALSE,
	isotopes
)
names(pattern);
######################################################


blosloos/nontarget documentation built on June 2, 2022, 3:53 p.m.