homol.search: Homologue series extraction from LC-MS data.
In nontarget: Detecting Isotope, Adduct and Homologue Relations in LC-MS Data

Description Usage Arguments Details Value Warning Note Author(s) See Also Examples

Dynamic programming algorithm for unsupervised detection of homologue series in LC-(HR)MS data.

1
2
3

homol.search(peaklist,isotopes,	elements=c("C","H","O"),use_C=FALSE,minmz=5,
maxmz=120,minrt=-2,maxrt=2,ppm=TRUE,mztol=3.5,rttol=0.5,minlength=5,
mzfilter=FALSE,vec_size=3E6,mat_size=3,R2=.98,spar=.45,plotit=FALSE,deb=0)

`peaklist`	Dataframe of picked LC-MS peaks with three numeric columns for (a) m/z, (b) intensity and (c) retention time, such as `peaklist`.
`isotopes`	Dataframe `isotopes`
`elements`	FALSE or chemical elements in the changing units of the homologue series, e.g. c("C","H") for alkane chains. Used to restrict search.
`use_C`	For `elements`: take element ratio to C-atoms into account? Used to restrict search.
`minmz`	Defines the lower limit of the m/z window to search homologue series peaks, relative to the m/z of the one peak to search from. Absolute m/z value [u].
`maxmz`	Defines the upper limit of the m/z window to search homologue series peaks, relative to the m/z of the one peak to search from. Absolute m/z value [u].
`minrt`	Defines the lower limit of the retention time (RT) window to look for other homologue peaks, relative to the RT of the one peak to search from, i.e., RT+minrt. For decreasing RT with increasing HS mass, use negative values of minrt.
`maxrt`	Defines the upper limit of the RT window to look for other homologue peaks, relative to the RT of the one peak to search from, i.e., RT+maxrt. See `minrt`.
`ppm`	Should `mztol` be set in ppm (`TRUE`) or in absolute m/z [u] (`FALSE`)?
`mztol`	m/z tolerance setting: +/- value by which the m/z of a peak may vary from its expected value. If parameter `ppm=TRUE` (see below) given in ppm, otherwise, if `ppm=FALSE`, in absolute m/z [u].
`rttol`	Retention time (RT) tolerance by which the RT between two adjacent pairs of a homologue series is allowed to differ. Units as given in column 3 of peaklist argument, e.g. [min].
`minlength`	Minimum number of peaks in a homologue series.
`mzfilter`	Vector of numerics to filter for homologue series with specific m/z differences of their repeating units, given the tolerances in `mztol`. Mind charge z!
`vec_size`	Vector size. Ignore unless a relevant error message is printed (then try to increase size).
`mat_size`	Matrix size for recombining, multiple of input tuples. Ignore unless a relevant error message is printed (then try to increase size).
`R2`	FALSE or 0<numeric<=1. Coefficient of determination for cubic smoothing spline fits of m/z versus retention time; homologue series with lower R2 are rejected. See `smooth.spline`.
`spar`	Smoothing parameter, typically (but not necessarily) in (0,1]. See `smooth.spline`.
`plotit`	Logical FALSE or 0<integer<5. Intermediate plots of nearest neigbour paths, spline fits of individual homologues series >=`minlength`, clustered HS pairs, etc .
`deb`	Debug returns, ignore.

A dynamic programming approach is used to extract series of peaks that differ in constant m/z units and smooth changes in their retention time within bounds of mass defect changes. First, a nearest neighbour path through a kd-tree representation of the data is used to extract all feasible peak triplets. These triplets are then combined to all plausible n-tupels in n-3 steps. At each such step, each newly formed n-tupel is checked for smooth changes of RT with increasing m/z of the homologues, using cubic splines and a R2-based threshold of the model fit.

List of type homol with 6 entries

`homol[[1]]`	`Homologue Series`. Dataframe with peaks (`mass`,`intensity`,`rt`,`peak ID`) and their homologue series relations (`to ID`,`m/z increment`,`RT increment`) within different homologue series (`HS IDs`,`series level`). Last column `HS cluster` states HS clusters into which a peak was assigned via its HS.
`homol[[2]]`	`Parameters`. Parameters used.
`homol[[3]]`	`Peaks in homologue series`. Dataframe listing all peaks (`peak IDs`) per homologue series (`HS IDs`), the underlying mean m/z & RT increments (`m/z increments`, `RT increments`) and the minimum and maximum RT changes between individual peaks of the series.
`homol[[4]]`	`m/z restrictions used`. See function argument `mzfilter`.
`homol[[5]]`	`Peaks per level`. List of peak IDs per level in the individual series.
`homol[[6]]`	Ignore. List with superjacent HS IDs per group - for set`deb=c(3,...)`

The rttol argument of homol.search must not be mixed with that of pattern.search or pattern.search2.

Arguments isotopes and elements are needed to limit intermediate numbers of m/z differences to screen over, based on feasible changes in mass defect. Similarly, intermediate numbers are also limited by the retention time and m/z windows defined by minmz/maxmz and minrt/maxrt/rttol, respectively. The latter are always set relative to the individual RT and m/z values of the peaks to be searched from. Overall, these parameters must be chosen carefully to avoid a combinatorial explosion of triplet m/z differences, leading to slow computation, memory problems or senseless results.

Values for spar and R2 have to be adjusted for different chromatographic settings; the smoothing spline fits are used to eliminate homologue series candidates with erratic RT-behaviour. Spline fits at >=minlength can be viewed by plotit=2.

Peak IDs refer to the order in which peaks are provided. Different IDs exist for adduct groups, isotope pattern groups, grouped homologue series (HS) peaks and homologue series cluster. Yet other IDs exist for the individual components (see note section of combine).

Here, IDs of homologue series group are given both in the function output homol[[1]], homol[[3]] and homol[[6]], with one homologue series stating one group of interrelated peaks.

Martin Loos

rm.sat isotopes peaklist plothomol

data(peaklist);
data(isotopes)
homol<-homol.search(
	peaklist,
	isotopes,	
	elements=c("C","H","O"),
	use_C=TRUE,
	minmz=5,
	maxmz=120,
	minrt=-.5,
	maxrt=2,
	ppm=TRUE,
	mztol=3.5,
	rttol=0.5,
	minlength=5,
	mzfilter=FALSE,
	vec_size=3E6,
	mat_size=3,
	spar=.45,
	R2=.98,
	plotit=FALSE
)	
plothomol(homol);