pattern.search: Detecting and grouping isotope m/z relations among peaks in a...

Description Usage Arguments Details Value rules setting Warning Note Author(s) See Also Examples

Description

Algorithm for detecting isotopes pattern peak groups generated by an unknown candidate chemical component.

Usage

1
2
3
4
pattern.search(peaklist, iso, cutint = min(peaklist[, 2]), rttol = c(-0.5, 0.5), 
mztol = 3, mzfrac = 0.1, ppm = TRUE, inttol = 0.5, 
rules = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE), 
deter = FALSE, entry = 20)

Arguments

peaklist

Dataframe of HRMS peaks with three numeric columns for (a) m/z, (b) intensity and (c) retention time, such as peaklist.

iso

Object generated by make.isos from isotopes, defining the isotopes m/z differences to be screened for.

cutint

Cutoff intensity. Peaks below this intensity will be (a) omitted and (b) not expected by any of the plausibility rules (see details). See parameter rules below.

rttol

Minus, plus retention time tolerance. Units as given in column 3 of peaklist argument, e.g. [min].

mztol

m/z tolerance setting: value by which the m/z of a peak may vary from its expected value. If parameter ppm=TRUE (see below) given in ppm, otherwise, if ppm=FALSE, in absolute m/z [u]. Defines the "large" mass tolerance used.

mzfrac

"Small" mass tolerance used. Given as a fraction of mztol, see above.

ppm

Should mztol be set in ppm (TRUE) or in absolute m/z (FALSE)

inttol

Intensity tolerance setting: fraction by which peak intensities may vary. E.g. if set to 0.2, a peak with expected intensity 10000 may range in between 8000 and 12000.

rules

Enabling(TRUE)/disabling(FALSE) of rules[1] to rules[11], see details. Vector with eight entries.

deter

If using deter.iso instead of make.isos, set to TRUE. This disables all rules and makes pattern.search compatible with argument iso inputs from deter.iso. Otherwise, ignore.

entry

Memory allocation setting. Increase value if the corresponding warning is issued. Otherwise, ignore.

Details

Detecting groups of isotope pattern peaks involves two steps.

In a first step, and within the given tolerances rttol and mztol, m/z differences among any two peaks are screened for matching differences in m/z among different isotope(s) of an element, as provided by the iso argument. This leads to a set of candidate isotope m/z differences, with each subsequently undergoing four plausibility checks (rules parameter entries 1 to 7).

In a second step, the remaining candidate m/z isotope differences are sorted in tree-like structures (so-called isotope pattern groups), starting from the lowest m/z peak of the data set. Thus, a tree consists of several (>=2) peaks related by isotope m/z differences; the peak with lowest m/z in the tree (root node) represents the monoisotopic peak of the associated candidate molecular component. This does not require prior knowledge about the chemical nature of the components assigned. Again, the resulting trees undergo plausibilization (rules parameter entries 8 to 11).

In addition, groups with m/z isotope differences being detected within "small" mztol are used to calculate a minimum number of atoms per element associated with that m/z isotope difference.

Value

List of type pattern with 12 entries

pattern[[1]]

Patterns. Dataframe with peaks (mass,intensity,rt,peak ID) and their isotope pattern relations (to ID,isotope(s),mass tolerance,charge level) within isotope pattern groups (group ID,interaction level).

pattern[[2]]

Parameters. Parameters used.

pattern[[3]]

Peaks in pattern groups. Dataframe listing all peaks (peak IDs) per isotope pattern group (group ID) at the given z-level(s) (charge level).

pattern[[4]]

Atom counts. Groups with m/z isotope differences being detected within "small" mztol are used to calculate a minimum number of atoms per element associated with that m/z isotope difference.

pattern[[5]]

Count of pattern groups. Number of isotope pattern groups found on the different z-levels used.

pattern[[6]]

Removals by rules. Times rules lead to rejections (rules[1] to rules[10]) or a merging of nested groups (rules[11]).

pattern[[7]]

Number of peaks with pattern group overlapping. Number of overlapping groups; overlap = 1 corresponds to no overlap.

pattern[[8]]

Number of peaks per within-group interaction levels.

pattern[[9]]

Counts of isotopes. Number of times a m/z isotope difference was detected (raw measure / number of isotope pattern groups)

pattern[[10]]

Elements. Elements used via argument iso derived by make.isos.

pattern[[11]]

Charges. z-levels used.

pattern[[12]]

Rule settings. rules[1] to rules[11] settings used.

rules setting

rules[1]: Intensities between two peaks associated via any of the candidate m/z isotope differences of the iso argument are compared. Given this difference in intensity, the minimum number of atoms for the element with highest abundance in argument iso is calculated. If (minimum number of atoms)*(minimum mass) > (m/z of lighter peak * maximum charge in argument iso), the candidate m/z difference is found implausible and therefore rejected. The minimum mass is set to that of protium (1H) plus its minimum association to numbers of carbon atoms, i.e. 1.0078 + (1/6 * 12.0000). Fast precheck to rules[2] and rules[3].

rules[2]: Repeats rules[1], but uses abundances and minimum masses (including the C-ratios of isotopes) for only those isotope(s) of argument iso ranging within the "large" m/z tolerance set by mztol.

rules[3]: Repeats rules[1], but now uses abundance and minimum masses (including the C-ratios of isotopes) individually for only those isotope(s) of argument iso ranging within the "small" m/z tolerance set by mztol*mzfrac.

rules[4]: If the intensity ratio between two peaks associated via any of the candidate m/z isotope differences of the iso argument is smaller than the smallest isotope abundance ratio of an element of argument iso, the candidate m/z difference is found implausible and therefore rejected. Fast precheck to rules[5] and rules[6].

rules[5]: Repeats rules[4], but now uses abundances for only those isotope(s) of argument iso ranging within the "large" m/z tolerance set by mztol.

rules[6]: Repeats rules[4], but now uses abundances for only those isotope(s) of argument iso ranging within the "small" m/z tolerance set by mztol.

rules[7]: Given those isotopes of argument iso ranging within the "small" m/z tolerance set by mztol and mzfrac and their C-ratio set in isotopes, the minimum number of carbon atoms and the associated 13C peak intensity to be expected at M+1 can be calculated. Checks if this expected 13C peak is present in the data set. If not, the candidate m/z difference is rejected.

rules[8]: Given the intensity and m/z of the monoisotopic peak in a growing isotope pattern tree and values from argument iso, the maximum m/z to which a tree can grow is restrict.

rules[9]: Given (a) the intensities of the monoisotopic peak (=tree root node, interaction level 1) and its first isotopic daughter peaks (tree interaction level 2) and (b) the candidate m/z isotope(s) within the "small" m/z tolerance set by mztol and mzfrac associated with (a), the occurrence of expected peaks (interaction level >2) above the value set by argument cutint is checked. If expected but not found, the peak at interaction level 1 is rejected as being the monoisotopic candidate peak, and a tree is grown on the remaining interrelated peaks. For example, if a monoisotopic peak (= tree interaction level 1) is associated with an intensive 13-C isotope peak (= tree interaction level 2), a second peak from two 13-C vs. 12-C isotope replacements can be expected and must be checked for.

rules[10]: Restriction to rules[7] and [9]: expected peaks are searched for only if no other measured peaks of higher intensity exist in a tolerance window of absolute m/z = 0.5 around the m/z of the expected peak. This allows skipping the search of expected peaks in cases of intensity masking by other peaks. For example, intensive 37-Cl often mask the occurrence of a second 13-C peak to be expected from rules[6], depending on the number of Cl and C atoms and the measurement resolution used.

rules[11]: In some cases, trees may - if several charges z are used - be nested within each other. This rule merges the nested group of charge z=x into the nesting peak group of z>x.

Warning

Acceptable outcomes strongly depend on appropriate parametrization of the algorithm.

Including many isotopes and overly large values for rttol and/or mztol may lead to overflows. In this case, a warning is issued to increase parameter entry or to adjust values of rttol and/or mztol.

Group IDs are valid both for pattern[[1]] and pattern[[3]].

Note

Peak IDs refer to the order in which peaks are provided. Different IDs exist for adduct groups, isotope pattern groups, grouped homologue series (HS) peaks and homologue series cluster. Moreover, and at the highest level, yet other IDs exist for the individual components (see note section of combine).

Depending on values of mztol, several m/z isotope differences from argument iso may match a measured m/z difference between two peaks.

rules[1] to rules[11] encompass uncertainties in intensity set by parameter inttol.

In some cases, two or several isotope pattern trees may overlap. Overlapping trees are not merged by rules[11] but only fully nested ones.

Disabling rules[10] may in some cases lead to false rejections of candidate m/z isotope differences for rules[7] and rules[9], especially for low resolutions.

rules[9] is recursive, i.e. may be applied several times on an ever decreasing number of peaks per tree, until plausibility holds or no m/z isotopic differences remain.

Author(s)

Martin Loos

See Also

pattern.search2 rm.sat peaklist make.isos plotisotopes plotdefect combine plotgroup isotopes resolution_list

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
######################################################
# load required data: ################################
# HRMS peak list: ####################################
data(peaklist)
peaklist<-rm.sat(peaklist,dmz=0.3,drt=0.1,intrat=0.015,spar=0.8,corcut=-1000,plotit=TRUE);
peaklist<-peaklist[peaklist[,4],1:3];
# list of isotopes ###################################
data(isotopes)
######################################################
# (1) run isotope pattern grouping ###################
# (1.1) define isotopes and charge (z) argument ######
iso<-make.isos(isotopes,
	use_isotopes=c("13C","15N","34S","37Cl","81Br","41K","13C","15N","34S","37Cl","81Br","41K"),
	use_charges=c(1,1,1,1,1,1,2,2,2,2,2,2))
# (1.2) run isotope grouping #########################
# save the list returned as "pattern" ################
pattern<-pattern.search(
  peaklist,
  iso,
  cutint=10000,
  rttol=c(-0.05,0.05),
  mztol=2,
  mzfrac=0.1,
  ppm=TRUE,
  inttol=0.2,
  rules=c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE),
  deter=FALSE,
  entry=50
);
names(pattern);
# extract peaks listed in isotope pattern group no.1 #
# under pattern[[3]] from pattern[[1]] ###############
pattern[[1]][as.numeric(strsplit(as.character(pattern[[3]][1,2]),",")[[1]]),];
# (1.3) plot results #################################
plotisotopes(pattern);
plotdefect(pattern,elements=c("N"));
######################################################

nontarget documentation built on May 2, 2019, 2:32 a.m.