library(BiocStyle) library(pander) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 4.5, fig.align = 'center', message = FALSE ) BiocStyle::markdown()
This vignette demonstrates tools provided by twoDxc
for analyzing
two-dimensional (2D) chromatography/mass spectrometry (MS) data. This package
relies on xcms
and CAMERA
to read and process MS data.
The example data file used will be an LC$\times$LC/MS run using a quadrupole-time-of-flight (QTOF) MS. The sample analyzed was a tea leaf extraction for a metabolomics study. The modulation time for the run was 60 seconds with approximately 30 seconds before 2D valve modulation began, specified as the delay time. Mass spectra were acquired at 2 Hz from 100 - 1700 m/z.
library(twoDxc) library(dplyr) library(future.apply) library(ggplot2) tea.file <- system.file('tea_data', 'tea_pool_1.mzML', package = 'twoDxc') tea.data <- readMSData(tea.file, mode = 'onDisk')
xcms
preprocessing2D chromatographic data can be processed the same as 1D data, except each compound can appear in multiple consecutive modulations. Below, the base peak chromatogram is plotted as if the data were 1D.
tea.bpi <- chromatogram(tea.data, aggregationFun = 'max') plot(tea.bpi)
The chromatogram's sawtooth pattern arises from the gradient separations performed at
fast, ultra high pressure conditions. We can see that at least two peaks from
caffeine elute in the modulations around 2700 seconds. The extracted ion
chromatogram (EIC) for caffeine can also be plotted to show multiple peaks per
compound. twoDxc
provides a function for calculating m/z ranges given a
parts-per-million (ppm) tolerance. The MS used to acquire this data has a
tolerance of 20 ppm. Note the retention time range in the plot is adjusted.
mz.range.195 <- calc.mz.Window(195.0882, 20) # m/z tolerance window for caffeine's accurate mass is: print(mz.range.195) eic.195 <- chromatogram(tea.data, mz = mz.range.195, rt = c(2500, 3000)) plot(eic.195)
Now peak detection will be performed on the data as if it were 1D with xcms
.
cwp <- CentWaveParam(snthresh = 100, peakwidth = c(6, 15), prefilter = c(5, 1000)) tea.peaks <- findChromPeaks(tea.data %>% filterRt(rt = c(4.8 * 60, 60 * 60)), param = cwp) pdp <- PeakDensityParam(sampleGroups = 1, minFraction = 1, bw = 3) tea.peaks <- groupChromPeaks(tea.peaks, pdp) feature.chroms <- featureChromatograms(tea.peaks, features = 1:4) par(mar = rep(2, 4)) plot(feature.chroms)
Notice the same ions are found about 60 seconds apart.
We will now group ions into pseudospectra using the CAMERA package.
tea.xsa <- xsAnnotate(as(tea.peaks, 'xcmsSet')) # Group by fwhm tea.xsaF <- groupFWHM(tea.xsa, perfwhm = 0.6) # Label isotopes tea.xsaI <- findIsotopes(tea.xsaF, mzabs = 0.01) # Group by correlation tea.xsaC <- groupCorr(tea.xsaF) # Find adducts tea.xsaFA <- findAdducts(tea.xsaC, polarity = 'positive', ppm = 25, intval = 'into') # Plot example spectrum plotPsSpectrum(tea.xsaFA, pspec = c(1:4), maxlabel = 5)
Three of the first four pspectra are 60 seconds apart from each other and exhibit similar spectra.
We will use twoDxc
to group these spectra together. The
group2D
function takes its arguments as: 1) the xsAnnotate object, 2) the
modulation time for the comprehensive 2D run, and 3) the delay time, or where the
time at which modulations start, to adjust the 2D retention time.
The algorithm groups features by their m/z, 1D RT, and 2D RT, with default tolerances of 20 ppm, 3 minutes, and 5 seconds, respectively. Matching features are grouped into one psg.2d group and their intensities summed. This way, we can make more accurate comparisons between different phenotypes and cut down on redundant features.
filter.Ion <- function(pspectra, ion, ppm = 20){ ion.range <- calc.mz.Window(ion, ppm) filtered.pspectra <- as.data.frame(pspectra) %>% filter(mz > ion.range[1] & mz < ion.range[2]) return(filtered.pspectra) } print(getpspectra(tea.xsaFA, grp = c(1:10)) %>% filter.Ion(195.0882)) tea.xsa2D <- group2D(tea.xsaFA, 60, 30) print(tea.xsa2D@pspec2D %>% filter.Ion(195.0882))
Below follows the sample sample processing steps but for three replicates of the tea extraction.
teas.files <- dir(system.file('tea_data', package = 'twoDxc'), full.names = T)[1:3] teas.group.info <- data.frame(sample_name = sub(basename(teas.files), pattern = '.mzML', replacement = '', fixed = T), sample_group = c(rep('lvl_1', 3)), stringsAsFactors = F ) teas.data <- readMSData(teas.files, pdata = new('NAnnotatedDataFrame', teas.group.info), mode = 'onDisk') xchr.multi <- findChromPeaks(teas.data %>% filterRt(rt = c(4.8 * 60, 60 * 60)), param = cwp) pdp.multi <- PeakDensityParam(sampleGroups = teas.data$sample_group, minFraction = (2/3), bw = 3) xchr.multi <- groupChromPeaks(xchr.multi, param = pdp.multi) xchr.multi <- fillChromPeaks(xchr.multi) feature.chroms.multi <- featureChromatograms(xchr.multi, features = 1:4)
xsa.multi <- xsAnnotate(as(xchr.multi, 'xcmsSet')) xsaF.multi <- groupFWHM(xsa.multi, perfwhm = 0.6) xsaI.multi <- findIsotopes(xsaF.multi) xsaC.multi <- groupCorr(xsaI.multi) xsaFA.multi <- findAdducts(xsaC.multi, polarity = 'positive') plotPsSpectrum(xsaFA.multi, pspec = 1, maxlabel = 5)
group2D
on multiple samplestimer <- proc.time() xsa2D.multi <- group2D(xsaFA.multi, 60, 30) proc.time() - timer
group2D
can run in parallel if future.apply
is installed. The below block
should demonstrate an improvement in processing time when data are grouped in
parallel.
plan(multiprocess) timer <- proc.time() xsa2D.multi.p <- group2D(xsaFA.multi, 60, 30, parallelized = T) proc.time() - timer identical(xsa2D.multi, xsa2D.multi.p)
Visualizing 2D chromatography data is easier as a heatmap plot, in which the x-axis shows 1D RT, the y-axis shows 2D RT, and the z-axis (color) shows ion intensity.
tea.2d = plot2D(tea.data, file = 1, 60, 25, plot.type = '2D', rt.max = 62*60) tea.2d tea.2d = plot2D(tea.data, file = 1, 60, 25, plot.type = '2Di') tea.2d
Because the total ion chromatogram (TIC) is a combination of intensities of all ion channels, this information underrepresents the chemical diversity in this metabolomics sample. Below is an example of an extracted ion chromatogram (EIC) for ion 195, representing caffeine.
plot2D(tea.data, file = 1, mod.time = 60, delay.time = 30, ion = 195.0882)
It may be useful to iterate through the m/z acquisition range and generate EICs for each ion. The example code below would do so for m/z range 100 to 108. The upper range for the loop can be adjusted to the max m/z recorded.
data.mz.range <- c(round(min(tea.data@featureData@data$lowMZ)):108) eics.2d <- lapply(data.mz.range, function(x){ plot2D(tea.data, file = 1, mod.time = 60, delay.time = 30, ion = x, mz.tol = 'abs', save.output = F) })
GCxGC example
gc.file = system.file('tea_data', 'GCxGC_ex.mzML', package = 'twoDxc') gc.data = readMSData(gc.file, mode = 'onDisk') gc.plot = plot2D(gc.data, mod.time = 4.00, plot.type = '2D') gc.plot
plot2D(tea.data, file = 1, mod.time = 60, delay.time = 25, ion = 195.0882, plot.type = '3D') test.subtract = subtractIon(tea.data, solvent.ions, 25) test.subtract.x = tea.data test.subtract.x = apply(solvent.ions, function(x){ subtractIon(test.subtract.x, x, 25) }) test.subtract.1 = subtractIon(tea.data, solvent.ions[2], 25) test.subtract.2 = subtractIon(test.subtract.1, solvent.ions[1], 25) plot2D(test.subtract.2, mod.time = 60, delay.time = 25, plot.type = '2D') solvent.ions = c(158.9614, 172.9769, 922.0098, 190.1622, 141.9588, 126.9716, 182.9849, 222.9593) test.subtract.x = tea.data for(ion in solvent.ions){ test.subtract.x = subtractIon(test.subtract.x, ion, 25) } #plot2D(tea.data, mod.time = 60, delay.time = 25, plot.type = '2D') plot2D(test.subtract.x, mod.time = 60, delay.time = 25, plot.type = '2D')
plot2D(tea.data, mod.time = 60, delay.time = 25, plot.type = '2D', rt.max = 62 * 60)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.