_Cardinal 2_: User guide for mass spectrometry imaging analysis

BiocStyle::markdown()
library(Cardinal)
setCardinalBPPARAM(SerialParam())
setCardinalVerbose(FALSE)
RNGkind("Mersenne-Twister")

Introduction

Cardinal 2 provides new classes and methods for the manipulation, transformation, visualization, and analysis of imaging experiments--specifically mass spectrometry (MS) imaging experiments.

MS imaging is a rapidly advancing field with consistent improvements in instrumentation for both MALDI and DESI imaging experiments. Both mass resolution and spatial resolution are steadily increasing, and MS imaging experiments grow increasingly complex.

The first version of Cardinal was written with certain assumptions about MS imaging data that are no longer true. For example, the basic assumption that the raw spectra can be fully loaded into memory is unreasonable for many MS imaging experiments today.

Cardinal 2 was re-written from the ground up to handle the evolving needs of high-resolution MS imaging experiments. Some advancements include:

Classes from older versions of Cardinal should be coerced to their Cardinal 2 equivalents. For example, to return an updated MSImageSet object called x, use as(x, "MSImagingExperiment").

Installation

Cardinal can be installed via the BiocManager package.

install.packages("BiocManager")
BiocManager::install("Cardinal")

The same function can be used to update Cardinal and other Bioconductor packages.

Once installed, Cardinal can be loaded with library():

library(Cardinal)

Components of an MSImagingExperiment

In Cardinal, imaging experiment datasets are composed of multiple sets of metadata, in addition to the actual experimental data. These are (1) pixel metadata, (2) feature ($m/z$) metadata, (3) the actual imaging data, and (4) a class that holds all of these and represents the experiment as a whole.

MSImagingExperiment is a matrix-like container for complete MS imaging experiments, where the "rows" represent the mass features, and the columns represent the pixels. An MSImagingExperiment object contains the following components:

Unlike many software packages designed for analysis of MS imaging experiments, Cardinal is designed to work with multiple datasets simultaneously and can integrate all aspects of experimental design and metadata.

Example data

We will use simulateImage() to prepare an example dataset.

set.seed(2020)
mse <- simulateImage(preset=1, npeaks=10, nruns=2, baseline=1)
mse

Pixel metadata with PositionDataFrame

The pixelData() accessor extracts the pixel information for an MSImagingExperiment. The pixel data are stored in a PositionDataFrame object, which is a type of data frame that separately tracks pixel coordinates and experimental run information.

pixelData(mse)

The coord() accessor specifically extracts the data frame of pixel coordinates.

coord(mse)

The run() accessor specifically extracts the vector of experimental runs.

run(mse)[1:10]

Feature metadata with MassDataFrame

The featureData() accessor extracts the feature information for an MSImagingExperiment. The feature data are stored in a MassDataFrame object, which is a type of data frame that separately tracks the m/z-values associated with the mass spectral features.

featureData(mse)

The mz() accessor specifically extracts the vector of m/z-values.

mz(mse)[1:10]

Accessing spectra and image data

The imageData() accessor extracts the image data for an MSImagingExperiment. The data are stored in an ImageArrayList, which is a list of matrix-like objects.

It is possible to store multiple matrices of intensities in this list. Typically, only the first entry will be used by pre-processing and analysis methods.

imageData(mse)

Entries in this list can be extracted by name with iData().

iData(mse, "intensity")

The spectra() accessor is a shortcut for accessing the first data matrix.

spectra(mse)[1:5, 1:5]

Rows of these matrices correspond to mass features. Columns correspond to pixels. In other words, each column is a mass spectrum, and each row is a flattened vector of images.

Specialized classes

In order to specialize to the needs of different datasets, Cardinal provides specialized versions of MSImagingExperiment that reflect different spectra storage strategies.

"Continuous" MS imaging experiments

MSContinuousImagingExperiment is a specialization that enforces the data matrices be stored as dense, column-major matrices. These include R's native matrix and matter_matc objects from the matter package.

mse
imageData(mse)

"Processed" MS imaging experiments

MSProcessedImagingExperiment is a specialization that enforces the data matrices be stored as sparse, column-major matrices. These include sparse_matc objects from the matter package. This specialization is useful for processed data.

set.seed(2020)
mse2 <- simulateImage(preset=3, npeaks=10, nruns=2)
mse3 <- as(mse2, "MSProcessedImagingExperiment")
mse3
imageData(mse3)

Because the data is stored sparsely, spectra from MSProcessedImagingExperiment objects are binned on-the-fly to the m/z-values specified by mz().

The resolution of the binning can be accessed by resolution().

resolution(mse3)

The resolution can be set to change how the binning is performed.

resolution(mse3) <- c(ppm=400)

Changing the binned mass resolution will typically change the effective dimensions of the experiment.

dim(mse2)
dim(mse3)

The effective m/z-values are also updated to reflect the new bins.

mz(mse2)[1:10]
mz(mse3)[1:10]

Note that the underlying data will remain unchanged, but the binned values for the intensities will be different.

Data import and export

Cardinal 2 natively supports reading and writing imzML (both "continuous" and "processed" versions) and Analyze 7.5 formats via the readMSIData() and writeMSIData() functions.

We will focus on imzML.

The imzML format is an open standard designed specifically for interchange of mass spectrometry imaging datasets. Many other formats can be converted to imzML with the help of free applications available online at .

Let's create a small image to demonstrate data import/export.

set.seed(2020)
tiny <- simulateImage(preset=1, from=500, to=600, dim=c(3,3))
tiny

We'll also create a "processed" version for writing "processed" imzML.

tiny2 <- as(tiny, "MSProcessedImagingExperiment")
tiny2

Note that despite the name, the only difference between "continuous" and "processed" imzML is how the data are stored, rather than what processing has been applied to the data. "Continuous" imzML stores spectra densely, with each spectrum sharing the same m/z-values. "Processed" imzML stores spectra sparsely, and each spectrum can have its own distinct m/z-values.

Writing imzML

Use writeMSIData() to write datasets to imzML and Analyze formats.

Writing "continuous" imzML

Internally, writeMSIData() will call either writeImzML() or writeAnalyze() depending on the value of outformat. The default is outformat="imzML".

path <- tempfile()
writeMSIData(tiny, file=path, outformat="imzML")

Two files are produced with extensions ".imzML" and ".ibd". The former contains an XML description of the dataset, and the latter contains the actual intensity data.

grep(basename(path), list.files(dirname(path)), value=TRUE)

Because tiny is a MSContinuousImagingExperiment, it is written as "continuous" imzML.

mzml <- readLines(paste0(path, ".imzML"))
grep("continuous", mzml, value=TRUE)

Writing "processed" imzML

We can also write "processed" imzML if we export a MSProcessedImagingExperiment file.

path2 <- tempfile()
writeMSIData(tiny2, file=path2, outformat="imzML")
mzml2 <- readLines(paste0(path2, ".imzML"))
grep("processed", mzml2, value=TRUE)

Writing datasets with multiple runs

If an experiment contains multiple runs, then each run will be written to a separate imzML file.

set.seed(2020)
tiny3 <- simulateImage(preset=1, from=500, to=600, dim=c(3,3), nruns=2)
runNames(tiny3)
path3 <- tempfile()
writeMSIData(tiny3, file=path3, outformat="imzML")
grep(basename(path3), list.files(dirname(path3)), value=TRUE)

Reading imzML

Use readMSIData() to read datasets in imzML or Analyze formats.

Reading "continuous" imzML

Internally, readMSIData() will guess the format based on the file extension (which must be included) and call either readImzML() or readAnalyze().

path_in <- paste0(path, ".imzML")
tiny_in <- readMSIData(path_in, attach.only=TRUE)
tiny_in

The attach.only argument is used to specify that the intensity data should not be loaded into memory, but instead attached as a file-based matrix using the matter package.

Starting in Cardinal 2, the default is attach.only=TRUE. This is more memory-efficient, but some methods may be more time-consuming due to the file I/O.

imageData(tiny_in)
spectra(tiny_in)

An out-of-memory matter matrix can be subsetted like an ordinary R matrix. The data values are only read from file and loaded into memory when they are requested (via subsetting).

spectra(tiny_in)[1:5, 1:5]

If loading the data fully into memory is desired, either set attach.only=FALSE when reading the data, or use as.matrix() on the intensity matrix.

spectra(tiny_in) <- as.matrix(spectra(tiny_in))
imageData(tiny_in)

Using collect() on the MSImagingExperiment will also load all data into memory.

tiny_in <- collect(tiny_in)

Reading "processed" imzML

For "processed" imzML files, the spectra must be binned to common m/z-values. By default, readImzML() will detect the mass range from the data. This requires reading a large proportion of data from the file, even if attach.only=TRUE.

path2_in <- paste0(path2, ".imzML")
tiny2_in <- readMSIData(path2_in)
tiny2_in

If known, it can be much more efficient to specify mass.range when importing "processed" imzML data. This can also be used to pre-filter the data to a smaller mass range.

The size of the m/z bins can be changed with the resolution argument.

tiny2_in <- readMSIData(path2_in, mass.range=c(510,590),
                        resolution=100, units="ppm")
tiny2_in

The resolution for the m/z bins can be changed after importing the data by setting the resolution() of the dataset.

resolution(tiny2_in) <- c(ppm=400)

Building from scratch

While importing formats besides imzML and Analyze are not explicitly supported by Cardinal, if the data can be read into R, it is easy to construct a MSImagingExperiment object from its components.

set.seed(2020)
s <- simulateSpectrum(n=9, peaks=10, from=500, to=600)

coord <- expand.grid(x=1:3, y=1:3)
run <- factor(rep("run0", nrow(coord)))

fdata <- MassDataFrame(mz=s$mz)
pdata <- PositionDataFrame(run=run, coord=coord)

out <- MSImagingExperiment(imageData=s$intensity,
                            featureData=fdata,
                            pixelData=pdata)
out

For loading data into R, read.csv() and read.table() can be used to read CSV and tab-delimited text files, respectively.

Likewise, write.csv() and write.table() can be used to write pixel metadata and feature metadata after coercing them to an ordinary R data.frame with as.data.frame().

Use saveRDS() and readRDS() to save and read and entire R object such as a MSImagingExperiment. Note that if intensity data is to be saved as well, it should be pulled into memory and coerced to an R matrix with as.matrix() first.

Visualization

Visualization of mass spectra and molecular ion images is vital for exploratory analysis of MS imaging experiments. Cardinal provides a plot() method for plotting mass spectra and an image() method for plotting ion images.

Cardinal 2 provides some useful visualization updates compared to previous versions:

Visualizing mass spectra with plot()

Use plot() to plot mass spectra. Either pixel or coord can be specified to determine which mass spectra should be plotted.

plot(mse, pixel=c(211, 611))

The plusminus argument can be used to specify that multiple spectra should be averaged and plotted together.

plot(mse, coord=list(x=10, y=10), plusminus=1, fun=mean)

A formula can be specified to further customize mass spectra plotting. The LHS of the formula should specify one or more elements of imageData() and the RHS of the formula should be a variable in featureData().

plot(mse, intensity + I(-log1p(intensity)) ~ mz, pixel=211, superpose=TRUE)

Visualizing images with image()

Use image() to plot ion images. Either feature or mz can be specified to determine which mass features should be plotted.

image(mse, mz=1200)

The plusminus argument can be used to specify that multiple mass features should be averaged and plotted together.

image(mse, mz=1227, plusminus=0.5, fun=mean)

By default, images from all experimental runs are plotted. If this is not desired, a subset can be specified instead.

image(mse, mz=1227, subset=run(mse) == "run0")
image(mse, mz=c(1200, 1227), subset=circle)

Multiplicative variance in spectral intensities can cause images to be noisy and dark due to hot spots.

Often, images may require some type of processing and enhancement to improve interpretation.

image(mse, mz=1200, smooth.image="gaussian")
image(mse, mz=1200, contrast.enhance="histogram")

The default viridis colorscale is chosen to enable ease of interpretation. However, other colorscales can be chosen. All colorscales from the viridisLite package are available in Cardinal, including cividis, magma, inferno, and plasma.

The magma colorscale is used below with a different dataset.

image(mse2, mz=1136, colorscale=magma)

Cardinal will try to choose an appropriate layout for the images. However, a user-defined one can be specified by layout. Use layout=NULL to use the current plotting device as-is.

image(mse2, mz=c(781, 1361), layout=c(1,4), colorscale=magma)

Multiple images can be superposed with superpose=TRUE. Use normalize.image="linear" to normalize all images to the same intensity range.

image(mse2, mz=c(781, 1361), superpose=TRUE, normalize.image="linear")

Like plot(), a formula can be specified. The LHS should specify one or more elements of imageData() and the RHS should specify two to three variables from pixelData().

image(mse2, log1p(intensity) ~ x * y, mz=1136, colorscale=magma)

Visualizing 3D imaging experiments

3D imaging datasets can be plotted with image3D().

set.seed(1)
mse3d <- simulateImage(preset=9, nruns=7, dim=c(7,7), npeaks=10,
                        peakheight=c(2,4), representation="centroid")

image3D(mse3d, mz=c(1001, 1159), colorscale=plasma, cex=3, theta=30, phi=30)

Any dataset with 3 or more spatial coordinates (columns in coord()), can be plotted in 3D.

Region-of-interest selection

Use selectROI() to select regions-of-interest on an ion image. It is important to specify a subset so that selection is only made on a single experimental run, otherwise results may be unexpected. The form of selectROI() is the same as image().

sampleA <- selectROI(mse, mz=1200, subset=run(mse) == "run0")
sampleB <- selectROI(mse, mz=1200, subset=run(mse) == "run1")

selectROI() returns a logical vector specifying which pixels from the imaging experiment are contained in the selected region.

makeFactor() can then be used to combine multiple logical vectors (e.g., from selectROI()) into a single factor.

regions <- makeFactor(A=sampleA, B=sampleB)

Saving plots and images

Plots and images can be saved to a file by using R's built-in graphics devices.

Use pdf() to initialize a PDF graphics device, create the plot, and then use dev.off() to turn off the graphics device.

Any plots printed while the graphics device is active will be saved to the specified file(s).

pdf_file <- paste0(tempfile(), ".pdf")

pdf(file=pdf_file, width=9, height=4)
image(mse, mz=1200)
dev.off()

Graphics devices for png(), jpeg(), bmp(), and tiff() are also available. See their documentation for usage.

Dark themes

While many software for MS imaging data use a light-on-dark theme, Cardinal uses a transparent background by default. However, a dark theme is also provided through darkmode().

darkmode()
image(mse, mz=1200)
darkmode()
image(mse2, mz=1135, colorscale=magma)

Any future plots will use the new mode. The default mode can be restored with lightmode().

lightmode()

A note on plotting speed

While plotting spectra should typically be fast for most datasets regardless of data format, plotting images will typically be slower for out-of-memory (file-based) datasets and for MSProcessedImagingExperiment objects in general.

This is due to the way the spectra are stored in imzML and Analyze files, and the way sparse spectra are stored (regardless of location). Calculating and decompressing the images simply takes longer than reading the spectra.

For the fastest visualization of images, experiments should be coerced to an in-memory MSContinuousImagingExperiment.

Also note that all Cardinal 2 visualizations produce a print()-able object that can be assigned to a variable and print()-ed later without the need to read the data again.

g <- image(mse, mz=1200)
print(g)

This is useful for re-creating or updating plots without accessing the data again.

Common operations on MSImagingExperiment

Subsetting

MSImagingExperiment are matrix-like objects that can be subsetted using the [ operator.

When subsetting, the "rows" are the mass features, and the "columns" are the pixels.

# subset first 10 mass features and first 5 pixels
mse[1:10, 1:5]

Subsetting the dataset this way requires knowing the desired row and column indices.

features() returns row indices based on specified feature metadata.

# get row index corresponding to m/z 1200
features(mse, mz=1200)

# get row indices corresponding to m/z 1199 - 1201
features(mse, 1199 < mz & mz < 1201)

pixels() returns column indices based on specified pixel metadata.

# get column indices corresponding to x = 10, y = 10 in all runs
pixels(mse, coord=list(x=10, y=10))

# get column indices corresponding to x <= 3, y <= 3 in "run0"
pixels(mse, x <= 3, y <= 3, run == "run0")

These methods can be used to determine row/column indices of particular m/z-values or pixel coordinates to use for subsetting.

fid <- features(mse, 1199 < mz & mz < 1201)
pid <- pixels(mse, x <= 3, y <= 3, run == "run0")
mse[fid, pid]

Slicing

MSImagingExperiment represents the data as a matrix, where each column is a mass spectrum, rather than as a true "data cube". This is typically simpler when operating on the mass spectra, and more space efficient when the data is pixel-sparse (i.e., non-rectangular).

Sometimes, however, it is useful to index into the data as an actual "data cube", with explicit array dimensions for each spatial dimension.

Use slice() to slice an MSImagingExperiment as a data cube and extract images.

# slice image for first mass feature
a <- slice(mse, 1)
dim(a)

Any arguments to slice() are passed to features(), making it easy to select the desired image slices.

By default, array dimensions with only one level are dropped; use .preserve=TRUE to keep all dimensions.

# slice image for m/z 1200
a2 <- slice(mse, mz=1200, drop=FALSE)
dim(a2)

Note that when plotting images from raw arrays, the images are upside-down due to differing coordinate conventions used by graphics::image().

image(a2[,,1,1], col=bw.colors(100))

Combining

Because MSImagingExperiment is matrix-like, rbind() and cbind() can be used to combine multiple MSImagingExperiment objects by "row" or "column", assumping certain conditions are met.

Use cbind() to combine datasets from different experimental runs. The m/z-values must match between all datasets to succesfully combine them.

# divide dataset into separate objects for each run
mse_run0 <- mse[,run(mse) == "run0"]
mse_run1 <- mse[,run(mse) == "run1"]
mse_run0
mse_run1
# recombine the separate datasets back together
mse_cbind <- cbind(mse_run0, mse_run1)
mse_cbind

Some processing may be necessary to ensure datasets are compatible before combining them.

Getters and setters

Most components of an MSImagingExperiment that can be accessed through getter functions like pixelData(), featureData(), and imageData(), can also be re-assigned with analogous setter functions. These can likewise be used to get and set columns of the pixel and feature metadata.

pData() and fData() are aliases for pixelData() and featureData(), respectively.

The $ operator will access the corresponding columns of pixelData().

mse$region <- makeFactor(circle=mse$circle,
                            bg=!mse$circle)
pData(mse)

iData() can be used to access elements of the imageData() list by name or index.

Using iData() with no arguments besides the dataset will get or set the first element of imageData(). Providing a name or index will get or set that element.

iData(mse, "log2intensity") <- log2(iData(mse) + 1)
imageData(mse)

For MSImagingExperiment, spectra() is an alias for iData().

spectra(mse, "log2intensity")[1:5, 1:5]

Whether or not the spectra have been centroided or not can be accessed using centroided()

centroided(mse)

This can also be used to set whether the spectra should be treated as centroided or not.

centroided(mse) <- FALSE

Summarization (e.g., mean spectra)

Cardinal 2 implements several convenient data manipulation verbs for subsetting and summarizing MSImagingExperiment objects.

The %>% operator can be used to chain these operations together. For file-based data, the subsetting should be quick, as only metadata is modified.

# subset by mass range
subsetFeatures(mse, mz > 700, mz < 900)

# subset by pixel coordinates
subsetPixels(mse, x <= 3, y <= 3, run == "run0")

# subset by mass range + pixel coordinates
subset(mse, mz > 700 & mz < 900, x <=3 & y <= 3 & run == "run0")

# chain verbs together
mse %>%
    subsetFeatures(mz > 700, mz < 900) %>%
    subsetPixels(x <= 3, y <= 3, run == "run0")

# calculate mean spectrum
summarizeFeatures(mse, "mean", as="DataFrame")

# calculate tic
summarizePixels(mse, c(tic="sum"), as="DataFrame")

# calculate mean spectrum of circle region
mse %>%
    subsetPixels(region == "circle") %>%
    summarizeFeatures("mean", as="DataFrame")

Pull data into memory

By default, Cardinal 2 does not load the spectra from imzML and Analyze files into memory, but retrieves them from files when necessary.

For very large datasets, this is necessary and memory-efficient.

However, for datasets that are known to fit in computer memory, this may be unnecessarily slow, especially when plotting images (which are perpendicular to how data are stored in the files).

# coerce spectra to file-based matter matrix
spectra(mse) <- matter::as.matter(spectra(mse))

spectra(mse)
imageData(mse)

Use pull() to pull all imageData() elements in an MSImagingExperiment into memory.

mse <- pull(mse)
imageData(mse)

By default, the spectra from MSProcessedImagingExperiment objects will still be represented as sparse matrices after using pull().

To coerce sparse spectra to an R matrix as well, use as.matrix=TRUE.

mse3 <- pull(mse3, as.matrix=TRUE)

Coercion to/from other classes

Use as() to coerce between different MSImagingExperiment sub-classes.

mse3
as(mse3, "MSContinuousImagingExperiment")

This will often change the underlying data representation, so some information may be lost depending on the coercion.

Objects of the older MSImageSet class can also be coerced to MSImagingExperiment this way.

# requires CardinalWorkflows installed
data(cardinal, package="CardinalWorkflows")
cardinal2 <- as(cardinal, "MSImagingExperiment")

Pre-processing

A major change in Cardinal 2 from earlier versions is how pre-processing is handled.

Instead of applying pre-processing immediately, each pre-processing step is queued to the dataset, and only applied once process() is called.

This approach is more computationally and memory efficient in most cases, as ideally each spectrum is only processed once, and no extraneous copies of the data are made.

Delayed processing with process()

On its own, the process() method queues a new pre-processing function, and then applies all currently queued processing functions.

For example, the following code applies very basic total-ion-current (TIC) normalization to all spectra.

mse_tic <- process(mse, function(x) length(x) * x / sum(x), label="norm")
mse_tic

By default, this is applied immediately. However, delay=TRUE delays this, and allows us to queue multiple pre-processing functions at once.

mse_pre <- mse %>%
    process(function(x) length(x) * x / sum(x), label="norm", delay=TRUE) %>%
    process(function(x) signal::sgolayfilt(x), label="smooth", delay=TRUE)

processingData(mse_pre)

We can view all pending and completed pre-processing steps in more detail.

mcols(processingData(mse_pre))[,-1]

Calling process() on the dataset again without any other arguments will apply all queued pre-processing steps.

mse_proc <- process(mse_pre)
mse_proc

Note that subsetting retains any queued pre-processing steps.

mse_pre %>%
    subsetPixels(x <= 3, y <= 3) %>%
    subsetFeatures(mz <= 1000) %>%
    process()

All pre-processing methods for MSImagingExperiment queue delayed processing by default.

If this is not desired, you can set options(Cardinal.delay=FALSE) to apply all pre-processing steps immediately.

Normalization

Use normalize() to queue normalization to an MSImagingExperiment.

mse_pre <- normalize(mse, method="tic")

TIC normalization is one of the most common normalization methods for mass spectrometry imaging. For comparison between datasets, TIC normalization requires that all spectra are the same length. RMS normalization is more appropriate when spectra are of different lengths.

Normalization to a reference is the most reliable form of normalization, but is only possible when the experiment contains a known reference peak with a constant abundance throughout the dataset. This is often not possible in unsupervised and exploratory experiments.

Spectral smoothing

Use smoothSignal() to queue spectral smoothing to an MSImagingExperiment.

mse %>% smoothSignal(method="gaussian") %>%
    subsetPixels(x==10, y==10, run=="run0") %>%
    process(plot=TRUE,
            par=list(main="Gaussian smoothing", layout=c(3,1)))

mse %>% smoothSignal(method="ma") %>%
    subsetPixels(x==10, y==10, run=="run0") %>%
    process(plot=TRUE, par=list(main="Moving average smoothing"))

mse %>% smoothSignal(method="sgolay") %>%
    subsetPixels(x==10, y==10, run=="run0") %>%
    process(plot=TRUE, par=list(main="Savitzky-Golay smoothing"))
mse_pre <- smoothSignal(mse_pre, method="gaussian")

Baseline correction

Use reduceBaseline() to queue baseline correction to an MSImagingExperiment.

mse %>% reduceBaseline(method="locmin") %>%
    subsetPixels(x==10, y==10, run=="run0") %>%
    process(plot=TRUE, par=list(main="Local minima", layout=c(2,1)))

mse %>% reduceBaseline(method="median") %>%
    subsetPixels(x==10, y==10, run=="run0") %>%
    process(plot=TRUE, par=list(main="Median interpolation"))
mse_pre <- reduceBaseline(mse_pre, method="locmin")

Peak processing

Peak processing encompasses multiple steps, including (1) picking peaks, (2) aligning peaks, (3) filtering peaks, and (4) binning profile spectra to the detected peaks.

Prior to peak detection is a good time to apply the previous processing steps.

mse_pre <- process(mse_pre)

(This is optional, and not necessary if only the peaks are desired, and if it is acceptable to have peaks with intensities of zero in pixels where that peak was not detected.)

Peak picking

Use peakPick() to queue peak picking to an MSImagingExperiment.

mse_pre %>% subsetPixels(x==10, y==10, run=="run0") %>%
    process() %>%
    peakPick(method="mad") %>%
    process(plot=TRUE, par=list(main="MAD noise", layout=c(3,1)))

mse_pre %>% subsetPixels(x==10, y==10, run=="run0") %>%
    process() %>%
    peakPick(method="simple") %>%
    process(plot=TRUE,
            par=list(main="Simple SD noise"))

mse_pre %>% subsetPixels(x==10, y==10, run=="run0") %>%
    process() %>%
    peakPick(method="adaptive") %>%
    process(plot=TRUE, par=list(main="Adaptive SD noise"))
mse_peaks <- peakPick(mse_pre, method="mad")

Peak alignment

Use peakAlign() to queue peak alignment to an MSImagingExperiment.

mse_peaks <- peakAlign(mse_pre, tolerance=200, units="ppm")

Peaks are matched based on proximity of their m/z-values, according to tolerance, in either parts-per-millin ("ppm") or absolute ("mz") units.

The m/z-values of known reference peaks can be provided. If no reference is provided, the mean spectrum is calculated, and the local maxima of the mean spectrum are used as the reference.

Peak filtering

Use peakFilter() to queue peak filtering to an MSImagingExperiment.

mse_peaks <- peakFilter(mse_pre, freq.min=0.05)

The proportions of pixels where a peak was detected at each m/z-value are calculated. Only peaks with frequencies greater than freq.min are retained.

Peak binning

Use peakBin() to queue binning of spectra to reference peaks.

Typically, this is applied to processed profile spectra after peak detection, to extract a more accurate representation of the peak intensities.

mse_peaks <- process(mse_peaks)
mse_peaks <- peakBin(mse_pre, ref=mz(mse_peaks), type="height")
mse_peaks <- mse_peaks %>% process()

A tolerance in either parts-per-million ("ppm") or absolute ("mz") units is used to match the reference peaks to local maxima in each spectrum.

The peak is then expanded to the nearest local minima in both directions. The intensity of the peak is then summarized either by the maximum intensity (type="height") or sum of intensities (type="area").

Rather than use the m/z-values of the detected peaks, we can also use known reference peaks (in this case, from the design of the simulated data).

mzref <- mz(metadata(mse)$design$featureData)
mse_peaks <- peakBin(mse_pre, ref=mzref, type="height")
mse_peaks <- mse_peaks %>% subsetPixels(x %in% 9:11, y %in% 9:11) %>% process()
plot(mse_peaks, coord=list(x=10, y=10))

We typically recommend binning the processed profile spectra to detected or reference peaks to produce centroided spectra suitable for downstream analysis.

It is possible to use the detected and aligned peaks directly (after applying peakAlign() and peakFilter()), but pixels where a peak was not detected will have zero intensities for those peaks. Moreover, all peak intensities will be the peak heights, even when peak areas may be desired instead. However, using the detected peaks directly does mean that there is no need to calculate and store the processed profile spectra, so this saves on storage space.

Generally, it is preferable and more accurate to bin the processed profile spectra to the detected peaks whenever possible and reasonable.

Mass alignment

Although peak alignment and peak binning will generally account for small differences in m/z between spectra, alignment of the profile spectra is sometimes desireable as well.

Use mzAlign() to queue alignment of spectra so that peaks will have a consistent m/z-value.

First, we need to simulate spectra that are in need of calibration.

set.seed(2020)
mse4 <- simulateImage(preset=1, npeaks=10, from=500, to=600,
                            sdmz=750, units="ppm")

plot(mse4, pixel=185:195, xlim=c(535, 570), key=FALSE, superpose=TRUE)

To align the spectra, we need to provide a vector of reference m/z values of expected peaks. Here, we will simply use the peaks of the mean spectrum.

mse4_mean <- summarizeFeatures(mse4)
mse4_peaks <- peakPick(mse4_mean, SNR=2)
mse4_peaks <- peakAlign(mse4_peaks, tolerance=1000, units="ppm")
mse4_peaks <- process(mse4_peaks)
fData(mse4_peaks)

mse4_align1 <- mzAlign(mse4, ref=mz(mse4_peaks), tolerance=2000, units="ppm")
mse4_align1 <- process(mse4_align1)
plot(mse4_align1, pixel=185:195, xlim=c(535, 570), key=FALSE, superpose=TRUE)

If no reference spectrum is provided, the mean spectrum is calculated automatically and used as the reference.

mse4_align2 <- mzAlign(mse4, tolerance=2000, units="ppm")
mse4_align2 <- process(mse4_align2)
plot(mse4_align2, pixel=185:195, xlim=c(535, 570), key=FALSE, superpose=TRUE)

The algorithm will try to match the most intense peaks in the reference to local maxima in each spectrum, within tolerance. If tolerance is too small, the matching local maxima may not be found. If tolerance is too large, then peaks may be matched to the wrong local maxima.

Mass binning

While the m/z binning scheme of MSProcessedImagingExperiment objects can be adjusted on-the-fly, this does not apply to other types of MSImagingExperiment.

Use mzBin() to queue binning of spectra to new m/z-values.

mse_bin <- mzBin(mse_pre, from=1000, to=1600, resolution=1000, units="ppm")
mse_bin <- subsetPixels(mse_bin, x %in% 9:11, y %in% 9:11) %>% process()
plot(mse_bin, coord=list(x=10, y=10))

This is useful if you need to combine datasets with different m/z-values.

Mass filtering

Sometimes it is necessary to filter the mass features of a dataset that has not been peak picked and aligned. For example, to remove noisy or low-intensity m/z-values.

Use mzFilter() to queue filtering of mass features.

mse_filt <- mzFilter(mse_pre, freq.min=0.05)
mse_filt <- subsetPixels(mse_filt, x %in% 9:11, y %in% 9:11) %>% process()
plot(mse_filt, coord=list(x=10, y=10), type='h')

mzFilter() and peakFilter() are actually the same function internally, but with different defaults for profile and centroided spectra, respectively.

Note that mzFilter() this does not set centroided() to TRUE; it is up to the user to decide whether the result represent centroided spectra or not, and set centroided() appropriately.

Example processing workflow

Delayed pre-processing makes it easy to chain together multiple pre-processing steps with the %>% operator, and then apply them all at once.

mse_proc <- mse %>%
    normalize() %>%
    smoothSignal() %>%
    reduceBaseline() %>%
    peakPick()

# preview processing
mse_proc %>%
    subsetPixels(x == 10, y == 10, run == "run0") %>%
    process(plot=TRUE, par=list(layout=c(2,2)))
# process detected peaks
mse_peakref <- mse_proc %>%
    peakAlign() %>%
    peakFilter() %>%
    process()

# bin profile spectra to peaks
mse_peaks <- mse %>%
    normalize() %>%
    smoothSignal() %>%
    reduceBaseline() %>%
    peakBin(mz(mse_peakref))

Advanced operations on MSImagingExperiment

Internally, most methods that iterate over pixels or mass features use some combination of pixelApply(), featureApply(), and spatialApply().

While summarize() is useful for summarizing a MSImagingExperiment, it is limited in that it can only apply summary functions that return a single value, which can be simplified into the columns of a MassDataFrame or PositionDataFrame.

By contrast, these functions (like apply() and lapply()) can return any value.

Using pixelApply() and featureApply()

pixelApply() is used to apply functions over pixels.

# calculate the TIC for every pixel
tic <- pixelApply(mse, sum)

image(mse, tic ~ x * y)

featureApply() is used to apply functions over features.

# calculate the mean spectrum
smean <- featureApply(mse, mean)

plot(mse, smean ~ mz)

Note that featureApply() will typically be slower than pixelApply() for out-of-memory data for MSProcessedImagingExperiment objects, due to the way the data is stored.

Using spatialApply()

spatialApply() is used to iterate over spatial neighborhoods of an imaging experiment.

Rather than vectors, it operates on matrices, where each column is a mass spectrum from a different pixel in the neighborhood.

# calculate a spatially-smoothed TIC
sptic <- spatialApply(mse, .r=1, .fun=mean)

image(mse, sptic ~ x * y)

See ?spatialApply for more details on attributes that are made available to the calling function (e.g., information about the neighborhood offsets, center pixel, etc.).

Parallel computing using BiocParallel

All pre-processing methods and most statistical analysis methods in Cardinal 2 can be executed in parallel using BiocParallel.

By default, a serial backend is used (no parallelization). This is for maximum stability and compatibility.

Using BPPARAM

Any method that supports parallelization includes BPPARAM as an argument (see method documentation). The BPPARAM argument can be used to specify a parallel backend for the operation, such as SerialParam(), MulticoreParam(), SnowParam(), etc.

# run in parallel, rather than serially
tic <- pixelApply(mse, sum, BPPARAM=MulticoreParam())

Backend types

Several parallelization backends are available, depending on OS:

Use of MulticoreParam() will frequently improve speed on macOS and Linux dramatically. However, due to the extra overhead of SnowParam(), Windows users may prefer the default SerialParam(), depending on the size of the dataset.

Getting available backends

Available backends can be viewed with BiocParallel::registered().

BiocParallel::registered()

The current backend used by Cardinal can be viewed with getCardinalBPPARAM():

getCardinalBPPARAM()

Setting a parallel backend

A new default backend can be set for use with Cardinal by calling setCardinalBPPARAM().

# register a SNOW backend
setCardinalBPPARAM(SnowParam())

See the BiocParallel package documentation for more details on available parallel backends.

RNG and reproducibility

For methods that rely on random number generation to be reproducible when run in parallel, the RNG must be set to "L'Ecuyer-CMRG" before setting a seed.

RNGkind("L'Ecuyer-CMRG")
set.seed(1)

Session information

sessionInfo()


Try the Cardinal package in your browser

Any scripts or data that you put into this service are public.

Cardinal documentation built on Nov. 8, 2020, 11:10 p.m.