BiocStyle::markdown()
library(BiocStyle)
This document describes how to use r Biocpkg("xcms")
for the analysis of
direct injection mass spec data, including peak detection, calibration and
correspondence (grouping of peaks across samples).
Prior to any other analysis step, peaks have to be identified in the mass spec
data. In contrast to the typical metabolomics workflow, in which peaks are
identified in the chromatographic (time) dimension, in direct injection mass
spec data sets peaks are identified in the m/z dimension. r Biocpkg("xcms")
uses functionality from the MassSpecWavelet
package to identify such peaks.
Below we load the required packages. For information on the parallel processing
setup please see the BiocParallel
vignette.
library(xcms) library(MassSpecWavelet) register(SerialParam())
In this documentation we use an example data set from the r Biocpkg("msdata")
package. Assuming that r Biocpkg("msdata")
is installed, we locate the path of
the package and load the data set. We create also a data.frame
describing the
experimental setup based on the file names.
mzdata_path <- system.file("fticr", package = "msdata") mzdata_files <- list.files(mzdata_path, recursive = TRUE, full.names = TRUE) ## Create a data.frame assigning samples to sample groups, i.e. ham4 and ham5. grp <- rep("ham4", length(mzdata_files)) grp[grep(basename(mzdata_files), pattern = "^HAM005")] <- "ham5" pd <- data.frame(filename = basename(mzdata_files), sample_group = grp) ## Load the data. ham_raw <- readMSData(files = mzdata_files, pdata = new("NAnnotatedDataFrame", pd), mode = "onDisk")
The data files are from direct injection mass spectrometry experiments, i.e. we have only a single spectrum available for each sample and no retention times.
## Only a single spectrum with an *artificial* retention time is available ## for each sample rtime(ham_raw)
Peaks are identified within each spectrum using the mass spec wavelet method.
## Define the parameters for the peak detection msw <- MSWParam(scales = c(1, 4, 9), nearbyPeak = TRUE, winSize.noise = 500, SNR.method = "data.mean", snthresh = 10) ham_prep <- findChromPeaks(ham_raw, param = msw) head(chromPeaks(ham_prep))
The calibrate
method can be used to correct the m/z values of identified
peaks. The currently implemented method requires identified peaks and a list of
m/z values for known calibrants. The identified peaks m/z values are then
adjusted based on the differences between the calibrants' m/z values and the m/z
values of the closest peaks (within a user defined permitted maximal
distance). Note that this method does presently only calibrate identified peaks,
but not the original m/z values in the spectra.
Below we demonstrate the calibrate
method on one of the data files with
artificially defined calibration m/z values. We first subset the data set to the
first data file, extract the m/z values of 3 peaks and modify the values
slightly.
## Subset to the first file. first_file <- filterFile(ham_prep, file = 1) ## Extract 3 m/z values calib_mz <- chromPeaks(first_file)[c(1, 4, 7), "mz"] calib_mz <- calib_mz + 0.00001 * runif(1, 0, 0.4) * calib_mz + 0.0001
Next we calibrate the data set using the previously defined artificial
calibrants. We are using the "edgeshift"
method for calibration that adjusts
all peaks within the range of the m/z values of the calibrants using a linear
interpolation and shifts all chromatographic peaks outside of that range by a
constant factor (the difference between the lowest respectively largest
calibrant m/z with the closest peak's m/z). Note that in a real use case, the
m/z values would obviously represent known m/z of calibrants and would not be
defined on the actual data.
## Set-up the parameter class for the calibration prm <- CalibrantMassParam(mz = calib_mz, method = "edgeshift", mzabs = 0.0001, mzppm = 5) first_file_calibrated <- calibrate(first_file, param = prm)
To evaluate the calibration we plot below the difference between the adjusted and raw m/z values (y-axis) against the raw m/z values.
diffs <- chromPeaks(first_file_calibrated)[, "mz"] - chromPeaks(first_file)[, "mz"] plot(x = chromPeaks(first_file)[, "mz"], xlab = expression(m/z[raw]), y = diffs, ylab = expression(m/z[calibrated] - m/z[raw]))
Correspondence aims to group peaks across samples to define the features (ions
with the same m/z values). Peaks from single spectrum, direct injection MS
experiments can be grouped with the MZclust method. Below we perform the
correspondence analysis with the groupChromPeaks
method using default
settings.
## Using default settings but define sample group assignment mzc_prm <- MzClustParam(sampleGroups = ham_prep$sample_group) ham_prep <- groupChromPeaks(ham_prep, param = mzc_prm)
Getting an overview of the performed processings:
ham_prep
The peak group information, i.e. the feature definitions can be accessed with
the featureDefinitions
method.
featureDefinitions(ham_prep)
Plotting the raw data for direct injection samples involves a little more
processing than for LC/GC-MS data in which we can simply use the chromatogram
method to extract the data. Below we extract the m/z-intensity pairs for the
peaks associated with the first feature. We thus first identify the peaks for
that feature and define their m/z values range. Using this range we can
subsequently use the filterMz
function to sub-set the full data set to the
signal associated with the feature's peaks. On that object we can then call the
mz
and intensity
functions to extract the data.
## Get the peaks belonging to the first feature pks <- chromPeaks(ham_prep)[featureDefinitions(ham_prep)$peakidx[[1]], ] ## Define the m/z range mzr <- c(min(pks[, "mzmin"]) - 0.001, max(pks[, "mzmax"]) + 0.001) ## Subset the object to the m/z range ham_prep_sub <- filterMz(ham_prep, mz = mzr) ## Extract the mz and intensity values mzs <- mz(ham_prep_sub, bySample = TRUE) ints <- intensity(ham_prep_sub, bySample = TRUE) ## Plot the data plot(3, 3, pch = NA, xlim = range(mzs), ylim = range(ints), main = "FT01", xlab = "m/z", ylab = "intensity") ## Define colors cols <- rep("#ff000080", length(mzs)) cols[ham_prep_sub$sample_group == "ham5"] <- "#0000ff80" tmp <- mapply(mzs, ints, cols, FUN = function(x, y, col) { points(x, y, col = col, type = "l") })
To access the actual intensity values of each feature in each sample the
featureValue
method can be used. The setting value = "into"
tells the
function to return the integrated signal for each peak (one representative peak)
per sample.
feat_vals <- featureValues(ham_prep, value = "into") head(feat_vals)
NA
is reported for features in samples for which no peak was identified at the
feature's m/z value. In some instances there might still be a signal at the
feature's position in the raw data files, but the peak detection failed to
identify a peak. For these cases signal can be recovered using the
fillChromPeaks
method that integrates all raw signal at the feature's
location. If there is no signal at that location an NA
is reported.
ham_prep <- fillChromPeaks(ham_prep, param = FillChromPeaksParam()) head(featureValues(ham_prep, value = "into"))
Further analysis, i.e. detection of features/metabolites with significantly
different abundances, or PCA analyses can be performed on the feature matrix
using functionality from other R packages, such as r Biocpkg("limma")
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.