HiResTEC-workflow"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

HiResTEC aims to find and validate measurement signals in mass spectrometry data that indicate the incorporation of a tracer molecule, i.e. $^{13}C$ which is a common approach in metabolic flux analysis.

To this end, at least 3 measurement files containing non-labeled samples and 3 measurement files containing labeled samples need to be available. This vignette will demonstrate the general workflow on a set of 18 samples from two groups and three time points.

Let's start with loading the example data set provided with the package and setting up a sample table describing these files.

raw <- HiResTEC::raw
sam <- data.frame(
  "gr" = rep(rep(c("A","B"), each=3), 3), 
  "tp" = rep(0:2, each=6), 
  "rep" = rep(1:3,6)
)
sam[,"ids"] <- apply(sam, 1, paste0, collapse="_")
head(sam)

It contains 18 measurements stored in a list as xcmsRawLike objects. The samples belong to two groups (annotated as A and B in column gr) and were taken at three different time points (annotated as 0, 1 and 2 in column tp). We can plot the base peak chromatograms (BPCs) for these files to observe that they contain only a small retention time window (appr. 10s) and a limited mass range.

length(raw)
class(raw[[1]])
raw[[1]]@mzrange
bpc <- lapply(raw, HiResTEC::getMultipleBPC)
HiResTEC::plotBPC(bpc, type = "bpc", mfrow = c(3,6), ids = sam[,"ids"])

There appears to be a peak at rt = 1026.5s in all files. Let's deconvolute the ion intensities at this rt and plot the spectrum.

rt <- 1026.5
s <- HiResTEC::DeconvoluteSpectrum(dat = raw, rt = rt)
InterpretMSSpectrum::PlotSpec(x = s)

The base peak is observed at m/z = 556.263. We can extract the mass isotopomer distribution (MID) of m/z 556.263 and its isotopes for all files. For better visibility we omit plotting the intermediate time point 1 and group B.

mz <- 556.263
mz_dev <- 0.4
bpc <- lapply(raw, function(x) { HiResTEC::getMultipleBPC(x = x, mz = mz + c(0:6)*1.003355, mz_dev = mz_dev) })
flt <- sam[,"tp"] %in% c(0, 2) & sam[,"gr"] == "A"
HiResTEC::plotBPC(bpc[flt], mfrow = c(2,3), ids = sam[flt,"ids"])

The figure shows that the MID changes systematically between replicate samples of time points 0 and 2. To test hypotheses of m/z pairs indicating label incorporation HiResTEC needs a peak list. You can use any peak picking algorithm to get such a peak list. HiResTEC accepts an xcmsSet result as input or alternatively a numeric matrix containing mz and rt information in the first two columns followed by peak intensities of all samples.

We can prepare such a peak list for the example data using the maxima from the BPCs we just extracted.

int <- sapply(bpc, function(x) { x[attr(x, "maxBPC"),] })
colnames(int) <- sam[,"ids"]
xg <- data.frame("mz" = as.numeric(rownames(int)), "rt" = rep(rt,7), int, row.names = NULL)
xg[,1:8]

Finally, we can apply the two main functions of HiResTEC. First, we evaluate our peak list to identify interesting m/z pairs.

preCL <- HiResTEC::EvaluatePairsFromXCMSSet(xg = xg, tp = sam$tp, gr = sam$gr, dmz = 0.04)
head(preCL[order(preCL[,"P"]),3:7])

The function finds all relevant pairs within the specified parameters (allowed mass difference, allowed rt window, see help file of EvaluatePairsFromXCMSSet() for details) and tests each pair using an ANOVA model based on the group and time point information of each sample.

Above, we sorted the output according to the P-value (column P). We could also have sorted according to column dR which gives the change in the intensity ratio between the first and the last time point specified. However, the lowest P-value is obtained for an m/z pair {559, 561} within the MID of our peak. The correct solution would be m/z pair {556, 561}.

The task to pick the best candidate, to apply rigorous quality control and avoid redundancy in the final result is achieved by the second step of the evaluation, where we cross check the preliminary candidate list preCL against the raw data.

finCL <- HiResTEC::EvaluateCandidateListAgainstRawData(
  x = preCL, tp = sam$tp, gr = sam$gr, dat = raw, dmz = 0.04, rolp = "all"
)

EvaluateCandidateListAgainstRawData() will result in a loooong list in a real non-targeted experiment. However, as we only tested the MID of a single peak and decided to remove redundancy (parameter rolp), only a single candidate remains.

This candidate can be exported to Excel, and can be checked by generating quality control plots in a PDF using function GenerateQCPlots(). Four QC plots are generated per candidate. First, the enrichment in all samples is depicted.

HiResTEC:::CandidateBoxplot(res = finCL[[1]])

The enrichment is estimated based on the assumption that n, the difference between mz1 and mz2 or the number of incorporated tracer atoms, is equivalent to the total number of tracer atoms in the molecule. This assumption is generally incorrect, but the obtained values are a good approximation for the amount of labeling.

Time points are color coded and different plotting symbols are used for groups. Individual samples can be identified in the left subplot due to their number from the sample list. The annotation 'dE' in the left subplot provides the maximum difference in enrichment between samples. The P-values annotated in the right subplot are obtained from the ANOVA result. In short, if 'dE' is large and 'P_tp' is small, it is worth looking at the other QC plots.

The second QC plot provides the BPCs as shown before. Let's limit this figure this time to the first replicate of each time point and group.

flt <- sam[,"rep"] == 1
bpc <- finCL[[1]][["bpc"]]
HiResTEC::plotBPC(bpc[flt], mfrow = c(3,2), ids = sam[flt,"ids"])

Numerous things can be checked in this plot. Obviously the peak shape of all masses should be nice and co-located. Note! that the intensities are depicted log10-transformed to enhance small signals. The spectra on the right hand subplots are not log10-transformed to emphasize differences between time points. Each spectrum depicts the intensities from the scan in the left subplot indicated by the grey line, usually the peak center. In the spectra, one should confirm that the ratio of mz1 (black) and mz2 (purple) is really changing. This sounds trivial, but is sometimes not the case if errors inpeak picking or deconvolution occurred.

Also, the small numbers at the top of each mass intensity are very informative. The indicate the difference of the measured mass from the theoretical mass in mDa. The theoretical mass of a M+5 isotope in a tracer experiment using $^{13}C$ would be calculated by $M + 5 \times 1.0034$ (because 1.0034 is the difference between $^{12}C$ and $^{13}C$). In GC-APCI-MS (which this example is coming from), analytes are derivatized before analysis, often using TMS groups (Tri-Methyl-Silyl). TMS contains silicon and silicon isotopes have a different mass difference than carbon. This leads to the effect of negative mass deviations in the above spectra for higher isotopes in non labeled samples, as here silicon determines the mass and not carbon. In labeled samples (tp=1 and tp=2), carbon starts to determine the mass deviation instead of silicon.

In consequence, observing a strong negative mass deviation in tp=0 samples for mz2 (purple) and a minor mass deviation in labeled samples is a strong indicator for successful tracer incorporation.

The third and fourth QC plots provide spectra deconvoluted from raw data. These should be checked to confirm if the candidate m/z pair is representative for the compound (is it the base peak? is it the likely M+H?).

x <- finCL[[1]]
mz1 <- x[["mz1"]]
mz2 <- x[["mz2"]]
tmp.txt <- data.frame("pos" = c(mz1, mz2), "val" = c("mz1", "mz2"))
ylim <- c(0, max(c(x[["s"]][abs(x[["s"]][, "mz"] - (mz1 + 4)) < 10, "int"], x[["s2"]][abs(x[["s2"]][, "mz"] - (mz1 + 4)) < 10, "int"]), na.rm = T))
graphics::layout(matrix(c(1, 2), ncol = 1))
InterpretMSSpectrum::PlotSpec(x[["s"]], rellab = x[["s"]][which.min(abs(x[["s"]][, 1] - mz1)), 1], masslab = 0, xlim = mz1 + c(-6, 15), txt = tmp.txt, ylim = ylim)
InterpretMSSpectrum::PlotSpec(x[["s2"]], rellab = x[["s2"]][which.min(abs(x[["s2"]][, 1] - mz2)), 1], masslab = 0, xlim = mz + c(-6, 15), txt = tmp.txt, ylim = ylim)

For those candidates looking promising, one can use the InterpretMSSpectrum package to identify likely sum formulas for each compound, and use the CorMID package to correct the intensities obtained by HiResTEC for natural abundance.

To demonstrate this shortly, InterpretMSSpectrum suggests C~22~H~46~N~5~O~4~Si~4~ as a possible sum formula.

fml <- "C22H46N5O4Si4"
attr(fml, "nbio") <- 5

# re-extract the BPCs including [M+]
mz <- finCL[[1]]$mz1
rt <- finCL[[1]]$rt
bpc <- lapply(raw, function(x) { 
  HiResTEC::getMultipleBPC(x = x, mz = mz + c(-1:6)*1.003355, mz_dev = 0.04, rt = rt) 
})

# BPCs show about 2.5% [M+] intensity, define r accordingly
r <- setNames(c(0.975, 0.025), nm = c("M+H", "M+"))
int <- sapply(bpc, function(x) { x[attr(x, "maxBPC"),] })
rownames(int) <- paste0("M",-1:6)
colnames(int) <- sam$ids
mid <- apply(int, 2, function(x) { 
  CorMID::CorMID(int = x, fml = fml, r = r) 
})

# show mean group corrected MID
sapply(split(as.data.frame(t(mid)), interaction(sam[,"gr"], sam[,"tp"])), function(x) {
  round(apply(x,2,mean),1)
})

As can be seen from the mean corrected MID per group, the estimate of enrichment shown in the QC plots above (70% and 40% at tp 2 for group A and B respectively) is pretty close to the results obtained by CorMID.

Have fun, using HiResTEC and, in the likely event that something does not work as expected, let me know.



Try the HiResTEC package in your browser

Any scripts or data that you put into this service are public.

HiResTEC documentation built on April 3, 2025, 9:35 p.m.