Warpgroup Examples

This is a set of example Warpgroup uses, each illustrating a different feature of the algorithm.

library(knitr)
opts_chunk$set(fig.width=11, fig.height=7, base64_images=F, fig.align="center", message=F, warning=F)
library(warpgroup)
library(ggplot2)

Data Format

data(example_1_3)
head(eic.mat)

Extracted Ion Chromatograms representative of the supplied peak bounds are supplied as a matrix, rows representing scans and columns representing samples.

peak.bounds

Peak bounds detected in a prerequisite step are supplied as a matrix. Columns scmin, scmax, and sc are integers refering to rows of the EIC matrix. Column sample refers to the column in the EIC matrix.

1. Consensus Peak Bound Determination

data(example_1_3)
plot_peaks_bounds(eic.mat, peak.bounds)

Our initial EICs and bounds all agree well (they were previously warpgrouped). Lets add some variance to the peak bounds to simulate real world peak detection.

bad.bounds = aperm(apply(peak.bounds, 1, function(x) {
  x[2] = x[2] + floor(runif(1, 5,11)) * sample(c(-1,1),1)
  x[3] = x[3] + floor(runif(1, 5,11)) * sample(c(-1,1),1)
  x
  }))
plot_peaks_bounds(eic.mat, bad.bounds)

We see the peak bounds are not self-consistent, each set of bounds describing a different peak region. This artifically increases our measurement variance and decreases our statstical power.

Lets see if we can fix this.

for (i in seq(ncol(eic.mat))) { eic.mat[,i] = eic.mat[,i]/max(eic.mat[,i]) } #Normalize to 1
wg.bounds = warpgroup(bad.bounds, eic.mat, sc.aligned.lim = 20)[[1]]
plot_peaks_bounds(eic.mat, wg.bounds)

After submitting the EIC traces and detected peak bounds to warpgrouping we see that all the samples have similar integration regions detected.

2. Peak Subregion Detection

Still need to find a good example of this

3. Determination of Undetected Peak Bounds (Fillpeaks)

data(example_1_3)
plot_peaks_bounds(eic.mat, peak.bounds)

Again our inital EICs all look good. Lets delete all the peaks but one and see how warpgroup handles this.

bad.bounds = peak.bounds[1,,drop=F]
for (i in seq(ncol(eic.mat))) { eic.mat[,i] = eic.mat[,i]/max(eic.mat[,i]) } #Normalize to 1

plot_peaks_bounds(eic.mat, bad.bounds)

We see only one sample had a peak detected in this region. Traditionally a large, indiscriminate region of the chromatogram is integrated to fill these missing values. The warpgroup solution follows.

wg.bounds = warpgroup(bad.bounds, eic.mat, sc.aligned.lim = 6)[[1]]
plot_peaks_bounds(eic.mat, wg.bounds)

After submitting the EIC traces and detected peak bound to warpgrouping we see that all the samples now have similar integration regions.

4. Grouping Peaks Which Deviate From Global Retention Time Correction

data(example_4)

plot_peaks_bounds(eic.mat, peak.bounds)

This time there are two peaks per sample in this rough group. These have been warpgrouped, but in real life samples each peak's drift will deviate somewhat from the global retention time shift. Lets add some extreme drift.

eic.mat.bad = array(numeric(), dim=dim(eic.mat))
peak.bounds.bad = peak.bounds

for (r in seq(ncol(eic.mat))) {
  shift = floor(runif(1,0,470))
  eic.mat.bad[,r] = c(
    eic.mat[shift:nrow(eic.mat),r], 
    rep(0,times=shift-1)
    )

  peak.bounds.bad[peak.bounds.bad[,"sample"] == r,1:3] = peak.bounds[peak.bounds[,"sample"] == r,1:3] - shift
}

plot_peaks_bounds(eic.mat.bad, peak.bounds.bad)

Here is a case that current algorithms would have no chance at grouping. The inter-sample drift is greater than the separation between the peaks to be grouped. Lets see how warpgrouping does.

for (i in seq(ncol(eic.mat.bad))) { eic.mat.bad[,i] = eic.mat.bad[,i]/max(eic.mat.bad[,i]) } #Normalize to 1
wg.bounds = warpgroup(peak.bounds.bad, eic.mat.bad, sc.aligned.lim = 6)

for (g in wg.bounds) print(plot_peaks_bounds(eic.mat.bad, g))

We see that warpgroup has correctly grouped each peak into the proper, distinct group.

5. An Extreme Example

This is an extreme example, data this unreliable probably shouldn't be trusted... but it provides a nice challenge.

data(example_5)

plot_peaks_bounds(eic.mat, peak.bounds)

We can clearly see two peaks in most samples. There is a large retention time drift. There is also a varying degree of merging between the two peaks. In some samples two distinct peaks were detected, in others a single peak was detected. Lets see how warpgroup handles this.

for (i in seq(ncol(eic.mat))) { eic.mat[,i] = eic.mat[,i]/max(eic.mat[,i]) } #Normalize to 1
wg.bounds = warpgroup(peak.bounds, eic.mat, sc.aligned.lim = 8)

for (g in wg.bounds) print(plot_peaks_bounds(eic.mat, g))

Warpgroup grouped the peaks into three distinct groups, each describing a different chromatographic region.



nathaniel-mahieu/warpgroup documentation built on May 23, 2019, 12:19 p.m.