homol.search: Homologue series extraction from LC-MS data.

Description Usage Arguments Details Value Warning Note Author(s) See Also Examples

Description

Dynamic programming algorithm for unsupervised detection of homologue series in LC-(HR)MS data.

Usage

1
2
3
homol.search(peaklist,isotopes,	elements=c("C","H","O"),use_C=FALSE,minmz=5,
maxmz=120,minrt=-2,maxrt=2,ppm=TRUE,mztol=3.5,rttol=0.5,minlength=5,
mzfilter=FALSE,vec_size=3E6,mat_size=3,R2=.98,spar=.45,plotit=FALSE,deb=0)

Arguments

peaklist

Dataframe of picked LC-MS peaks with three numeric columns for (a) m/z, (b) intensity and (c) retention time, such as peaklist.

isotopes

Dataframe isotopes

elements

FALSE or chemical elements in the changing units of the homologue series, e.g. c("C","H") for alkane chains. Used to restrict search.

use_C

For elements: take element ratio to C-atoms into account? Used to restrict search.

minmz

Defines the lower limit of the m/z window to search homologue series peaks, relative to the m/z of the one peak to search from. Absolute m/z value [u].

maxmz

Defines the upper limit of the m/z window to search homologue series peaks, relative to the m/z of the one peak to search from. Absolute m/z value [u].

minrt

Defines the lower limit of the retention time (RT) window to look for other homologue peaks, relative to the RT of the one peak to search from, i.e., RT+minrt. For decreasing RT with increasing HS mass, use negative values of minrt.

maxrt

Defines the upper limit of the RT window to look for other homologue peaks, relative to the RT of the one peak to search from, i.e., RT+maxrt. See minrt.

ppm

Should mztol be set in ppm (TRUE) or in absolute m/z [u] (FALSE)?

mztol

m/z tolerance setting: +/- value by which the m/z of a peak may vary from its expected value. If parameter ppm=TRUE (see below) given in ppm, otherwise, if ppm=FALSE, in absolute m/z [u].

rttol

Retention time (RT) tolerance by which the RT between two adjacent pairs of a homologue series is allowed to differ. Units as given in column 3 of peaklist argument, e.g. [min].

minlength

Minimum number of peaks in a homologue series.

mzfilter

Vector of numerics to filter for homologue series with specific m/z differences of their repeating units, given the tolerances in mztol. Mind charge z!

vec_size

Vector size. Ignore unless a relevant error message is printed (then try to increase size).

mat_size

Matrix size for recombining, multiple of input tuples. Ignore unless a relevant error message is printed (then try to increase size).

R2

FALSE or 0<numeric<=1. Coefficient of determination for cubic smoothing spline fits of m/z versus retention time; homologue series with lower R2 are rejected. See smooth.spline.

spar

Smoothing parameter, typically (but not necessarily) in (0,1]. See smooth.spline.

plotit

Logical FALSE or 0<integer<5. Intermediate plots of nearest neigbour paths, spline fits of individual homologues series >=minlength, clustered HS pairs, etc .

deb

Debug returns, ignore.

Details

A dynamic programming approach is used to extract series of peaks that differ in constant m/z units and smooth changes in their retention time within bounds of mass defect changes. First, a nearest neighbour path through a kd-tree representation of the data is used to extract all feasible peak triplets. These triplets are then combined to all plausible n-tupels in n-3 steps. At each such step, each newly formed n-tupel is checked for smooth changes of RT with increasing m/z of the homologues, using cubic splines and a R2-based threshold of the model fit.

Value

List of type homol with 6 entries

homol[[1]]

Homologue Series. Dataframe with peaks (mass,intensity,rt,peak ID) and their homologue series relations (to ID,m/z increment,RT increment) within different homologue series (HS IDs,series level). Last column HS cluster states HS clusters into which a peak was assigned via its HS.

homol[[2]]

Parameters. Parameters used.

homol[[3]]

Peaks in homologue series. Dataframe listing all peaks (peak IDs) per homologue series (HS IDs), the underlying mean m/z & RT increments (m/z increments, RT increments) and the minimum and maximum RT changes between individual peaks of the series.

homol[[4]]

m/z restrictions used. See function argument mzfilter.

homol[[5]]

Peaks per level. List of peak IDs per level in the individual series.

homol[[6]]

Ignore. List with superjacent HS IDs per group - for setdeb=c(3,...)

Warning

The rttol argument of homol.search must not be mixed with that of pattern.search or pattern.search2.

Note

Arguments isotopes and elements are needed to limit intermediate numbers of m/z differences to screen over, based on feasible changes in mass defect. Similarly, intermediate numbers are also limited by the retention time and m/z windows defined by minmz/maxmz and minrt/maxrt/rttol, respectively. The latter are always set relative to the individual RT and m/z values of the peaks to be searched from. Overall, these parameters must be chosen carefully to avoid a combinatorial explosion of triplet m/z differences, leading to slow computation, memory problems or senseless results.

Values for spar and R2 have to be adjusted for different chromatographic settings; the smoothing spline fits are used to eliminate homologue series candidates with erratic RT-behaviour. Spline fits at >=minlength can be viewed by plotit=2.

Peak IDs refer to the order in which peaks are provided. Different IDs exist for adduct groups, isotope pattern groups, grouped homologue series (HS) peaks and homologue series cluster. Yet other IDs exist for the individual components (see note section of combine).

Here, IDs of homologue series group are given both in the function output homol[[1]], homol[[3]] and homol[[6]], with one homologue series stating one group of interrelated peaks.

Author(s)

Martin Loos

See Also

rm.sat isotopes peaklist plothomol

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
data(peaklist);
data(isotopes)
homol<-homol.search(
	peaklist,
	isotopes,	
	elements=c("C","H","O"),
	use_C=TRUE,
	minmz=5,
	maxmz=120,
	minrt=-.5,
	maxrt=2,
	ppm=TRUE,
	mztol=3.5,
	rttol=0.5,
	minlength=5,
	mzfilter=FALSE,
	vec_size=3E6,
	mat_size=3,
	spar=.45,
	R2=.98,
	plotit=FALSE
)	
plothomol(homol);

nontarget documentation built on May 2, 2019, 2:32 a.m.