In kbseah/mass2adduct: Mass Spectrometry Imaging Molecular Adduct Finder

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Background: Molecular adducts in mass spectrometry imaging

This package presents tools for counting and identifying possible adducts in MS data, and accompanies Janda et al. (in prep.). In mass spectrometry imaging (MSI), adducts can form between target molecules (e.g. metabolites) and other substances such as matrix or salt ions.

Terminology

Each peak in a mass spectrum represents ions of a given mass/charge ratio (m/z) that have been detected by the instrument. In this documentation, we use the term mass peak (in short: mass or peak) to refer to both the ions themselves and to their nominal m/z values.[^1]

Two ions may differ chemically from each other by a certain chemical moiety, e.g. the gain/loss of a H2O molecule. This may represent two different metabolic compounds naturally present in the sample (e.g. sucrose vs. glucose, which differ from each other by a fructose unit). They may also represent changes that occur during the processing of a sample for MSI, or during the ionization process itself. We use the term chemical transformation to refer agnostically to the chemical difference between two ions. We coin the abbreviation massdiff for "mass difference" to refer to the absolute difference in m/z values between two mass peaks.

An adduct refers specifically to a transformation caused by the addition of a chemical moiety, often during the MSI procedure. It can also refer to the product of the adduct transformation, which would be a molecular ion. The precursor to an adduct is the parent ion/molecule.

In our analysis we look at pairs of peaks (peak pairs, ion pairs), and attempt to match their massdiffs to known chemical transformations, in order to find adducts derived from the chemical matrix used for MALDI-MSI.

[^1]: The terms "mass peak", "adduct", "transformation" are defined in the IUPAC Gold Book.

Import data

Raw MSI data come in different formats, including the emerging standard imzML format, which is based on XML. For the purposes of mass2adduct, we are interested in data after preprocessing (e.g. baseline correction, peak-binning/picking). This should be exported as plain-text CSV files from standard MSI software such as SCiLS or MSiReader.

You can find an example CSV file in the folder inst/extdata in the source package. The columns represent mass peaks, with m/z values as column names, and rows represent pixels, with row names (these can be arbitrary as they are not used in this workflow). The entries in the table represent intensity values for a given peak and pixel.

library(mass2adduct)
d <- msimat(system.file("extdata","msi.csv",package="mass2adduct"),sep=";")

If the data matrix is very large, it may need to be reformatted to be loaded into memory during an R session.

If you are using Cardinal to process your MSI data, data objects in the MSProcessedImagingExperiment or MSContinuousImagingExperiment formats can be converted to mass2adduct's msimat format with the cardinal2msimat() function. Cardinal version 2.2+ is required.

Typing the object name or using the summary() function gives you a summary of its contents.

Default plot method for an msimat object will display a mass spectrum, with vertical lines representing the total intensity of each peak.

plot(d)

Tabulate all pairwise mass differences

Take all possible pairs of masses and calculate the mass difference for each pair. These mass differences represent potential molecular adducts.

d.diff <- massdiff(d) # Returns object of classes data.frame and massdiff

View a summary of the contents:

summary(d.diff)

Preview the first few lines of the results:

head(d.diff)

There are three columns:

A is the parent ion peak mass
B is the putative adduct ion peak mass (higher than A)
diff is the difference between A and B

At this point, the intensity information is not yet used.

Quickly identify possible adducts

Of course, most pairs do not represent real or meaningful chemical transformations. However, we expect that common chemical transformations, e.g. the difference between carbon-12 and carbon-13 isotopes in organic compounds, will be more often encountered than the background "noise".

There are two built-in data sets adducts and adducts2 (shorter), which list biologically-relevant chemical species that might occur in biological samples.

data(package="mass2adduct") # View list of included datasets
data(adducts) # Load dataset
head(adducts) # Preview contents

If you are interested in something that's not in the built-in list, simply make your own table with the same three columns.

Visualize with a histogram

A histogram of all mass difference values will provide an overview. We expect that interesting transformations or adducts will be peaks in the histogram, i.e. they are more commonly encountered as parent-adduct ion pairs than random values.

d.diff.hist <- hist(d.diff) # Object of classes histogram and massdiffhist
plot(d.diff.hist,labels=10)

The plot shows peaks representing common chemical transformations in this sample. With the labels=10 option, you can label the top 10 peaks which have close matches to the adducts data passed to parameter add=.

We can examine certain areas of the histogram in more detail.

plot(d.diff.hist, add=adducts, labels=10, xlim=c(100,200))

Also note that the background of likely non-meaningful mass differences is not uniformly distributed, but instead has peaks at integer values. This can be understood from the fact that most chemical masses are themselves close to integer values, hence the differences between them should be too.

Summarize possible matches to chemical adducts

The package comes with a small data set of common molecular adducts, called adducts (use the command help(adducts) to see a description). Users can supply their own custom sets of adducts as long as they are in the same format (as a data.frame with three columns: name, formula, mass)

The following function looks for known adducts by finding the closest-matching bin in the mass difference histogram produced above. It reports the number of counts (i.e. how many pairs of MS peaks have that mass difference) and the quantile.

Note that quantile values will usually be quite high because the majority of mass differences have zero to few counts.

head(adductMatch(d.diff.hist),n=10) # Show the top ten matches to known adducts

However this only shows known adducts. What about peaks in the histogram which don't have a good match to the reference list? topAdducts ranks mass differences by their occurrences, and reports them in descending order, as well as matches to known adducts, if any

topAdducts(d.diff.hist, n=10) # Report top ten hits

Examine specific adducts in detail

Using the histogram gives us a useful overview and helps us to see if there are possible chemical transformations that we do not know about (i.e. unlabelled peaks). However, there are some drawbacks. The histogram bins massdiff values into bins of fixed width, whereas we know that the mass resolution of a mass spectrometer varies with the mass. What do we do if we would like to identify massdiffs that correspond to specific adducts of interest?

We can match massdiffs to specific adduct types using the same function adductMatch that we applied to the histogram above. This adds an additional column to the massdiff object called matches, which lists the closest-matching adduct (if any) corresponding to a given ion pair.

d.diff.annot <- adductMatch(d.diff,add=adducts2)
summary(d.diff.annot)
head(d.diff.annot)

The object d.diff.annot is a subset of the original d.diff object, as ion pairs without a matching adduct are excluded.

Test for correlations between parent and adduct ions

Now that we have a list of annotated massdiffs, we want to exploit the spatial information contained in the MSI data to test if each ion pair is actually spatially correlated in the sample. This performs a correlation test (by default two-tailed with Pearson's method) on each pair of peaks in the massdiff object. You'll have to supply the msimat object that was originally used to create the massdiff object, too.

d.diff.annot.cor <- corrPairsMSI(d,d.diff.annot)

For a small subset of the original list of pairs, this should not take too long. With a larger dataset, you can speed up the process with the option how=parallel if you have multiple processors on your computer. Use ncores=N to specify the number of processors N. For even larger datasets (e.g. hundreds or thousands of peaks), you might run out of memory and crash. To avoid this, use corrPairsMSIchunks instead of corrPairsMSI, and specify the maximum available memory with mem.limit=N where N is in Gb. This estimates the amount of memory needed, and splits up the job into "chunks" that will fit, also taking multiple processors into account. This comes at the expense of more overhead and less speed.

The correlation tests results will be added to the massdiff object, with three values reported: Estimate (correlation coefficient), P.value, and Significance (whether or not p-value is below the threshold after Bonferroni correction). As there are different ways to evaluate significance, the raw p-values (not Bonferroni corrected) are reported for you to run your own analysis if desired.

head(d.diff.annot.cor)

Visualizing adducts

Annotation of mass spectrum

You can annotate the original mass spectrum using a massdiff object, to mark peaks corresponding to parent ions and adduct ions in different colors.

For example, if one is interested only in sodium adducts:

d.diff.annot.cor.Na <- subset(d.diff.annot.cor,matches=="Na adduct")
plot(d)
pointsAdducts(d, d.diff.annot.cor.Na, which="adduct", signif=TRUE, pch=20, cex=0.5, col="red") # Adduct ions in red
pointsAdducts(d, d.diff.annot.cor.Na, which="parent", signif=TRUE, pch=1, cex=0.5, col="blue") # Parent ions in blue outlines