The purpose of this document is to explain how to use the IsotopicLabelling R package to analyse mass spectrometric isotopic patterns obtained following isotopic labelling experiments.
A typical labelling experiment makes use of substrates enriched in one stable isotope, such as ^2^H or ^13^C; consequently, after the growth period, some metabolites are expected to have incorporated the labelling isotope, and therefore its relative distribution within them will be different from its natural occurrence. The IsotopicLabelling R package is based on the principle that, since the isotopic patterns obtained in mass spectrometry reflect the isotopic compositions of the elements making up the observed species, the amount of labelling can be assessed by their proper examination.
Worth of note, because there could be overlapping between the isotopic patterns of different species, the isotopic pattern analysis is better suited for LC-MS or GC-MS data rather than for direct-infusion MS, where the chromatographic step prior to MS detection reduces such issues. Therefore, the current implementation of the package only works for LC-MS or GC-MS data.
Below is a step-by-step explanation on how to use the IsotopicLabelling package; the example data set that comes with the package will be used, and the involved functions will be properly introduced and discussed.
The IsotopicLabelling package requires the MS data to be a data frame with the first two columns representing m/z and retention time (RT) of the mass peaks, and the other columns containing peak intensities or areas (one column for each sample to be analysed).
Since a popular R package for handling MS data is xcms, the IsotopicLabelling R package can also read in xcmsSet objects, which basically contain the peak intensities or areas associated to each of the processed samples, together with their average retention time and m/z. In this case, the function table_xcms
is available for converting such objects to the required data frame.
The example data set included in the package is easily accessible:
data("xcms_obj")
This is an xcmsSet object representing lipid extracts of 8 samples from ^13^C labelling experiments:
This LC-MS data was kindly provided by Dr. Julian L Griffin and Dr. Nyasha Munjoma (Department of Biochemistry, University of Cambridge - UK).
The conversion of the xcmsSet object to the required data frame is achieved through:
peak_table <- table_xcms(xcms_obj)
Here are a few rows of the obtained data frame:
library(IsotopicLabelling) data("xcms_obj") peak_table <- table_xcms(xcms_obj)
peak_table[57:59,]
In addition to xcms, this data frame can be obtained in a number of other independent ways, such as through proprietary software of the vendor of the MS instrument; the important point is for the data frame to be properly formatted:
From the MS data frame, the whole isotopic pattern analysis can be performed through the single, compact function main_labelling
. As explained in the reference manual, it requires some input parameters:
compound, a character vector specifying the chemical formula of the compound of interest. A special notation should be used, whereby the character "X" denotes the element with unknown isotopic distribution. For example, the proton adduct of phosphocholine 32:2, [PC 32:2 + H]^+^, has chemical formula C40H77NO8P, but it should be written "X40H77NO8P" for ^13^C labelling experiments, and "C40X76HNO8P" for ^2^H experiments, in this last case keeping in mind that one hydrogen atom comes from the solvent, and has therefore fixed natural abundance. Please note that adduct ions should be specified, and not the neutral molecular species;
charge, a natural number representing the charge state of the target adduct (1,2,3,...)
Using the example data set, the parameters to enter for [PC 32:2 + H]^+^ are:
fitted_abundances <- main_labelling(peak_table, compound="X40H77NO8P", charge=1, labelling="C", mass_shift=0.05, RT=285, RT_shift=20, chrom_width=7, initial_abundance=NA)
The output is an object of class labelling, a list containing the results of the analysis.
Further details will be given in the following sections, where each of the steps undertaken by main_labelling
will be discussed and critically discussed.
The IsotopicLabelling package aims to find the abundance of the isotope used during the labelling experiment within the compounds of interest, based on their MS isotopic patterns. A number of algorithms are available to a-priori compute the isotopic patterns knowing the isotopic abundances; the goal of the IsotopicLabelling package is the opposite: starting from measured experimental patterns, this package aims at understanding which is the isotopic distribution giving rise to those patterns. This is basically achieved by a weighted non-linear least squares fitting procedure that finds the abundance giving rise to the patterns that best match the experimental ones.
The main_labelling
function basically performs three successive steps:
isotopic_information
function);isotopic_pattern
function);An alternative to using main_labelling
is to run the three distinct functions it is made of, and this route is covered below.
The first function used by main_labelling
is isotopic_information
, which summarizes important isotopic information in a single object, a list required by the subsequent functions. The input parameters are the chemical formula, the type of labelling and, if different from 1, the charge state of the target analyte; for [PC 32:2 + H]^+^, the list can be obtained through:
info <- isotopic_information(compound="X40H77NO8P", charge=1, labelling="C")
As detailed in the reference manual, the output is a named list:
info <- isotopic_information(compound="X40H77NO8P", labelling="C")
attributes(info)
In particular, "isotopes" is a table with the natural isotopic abundances (numbers between 0 and 1) of the elements present in the compound. The two isotopes of the labelling element X are given NA values:
info$isotopes
Importantly, "target" is a named vector with the exact m/z of all the possible isotopologues arising because of the labelling isotope; in the example, [PC 32:2 + H]^+^ has 40 carbon atoms, and therefore (considering that charge=1) the possible isotopologues coming from ^13^C span a 41 mass range: the lightest one is the monoisotopic species (with 40 ^12^C atoms), whereas the heaviest is the species with 40 ^13^C atoms. However, the isotopic patterns also depend on the other elements, and therefore the list of target isotopologues is further extended by two m/z units, enough for small and medium-sized molecules such as lipids and metabolites.
The naming of the target masses follows this convention:
These target masses already take into consideration the charge state of the analyte.
Crucially, the underlying assumption of IsotopicLabelling is that the MS resolution is not high enough to resolve the isotopic fine structure; consequently, the replacement of, for example, ^1^H with ^2^H is indistinguishable from ^12^C with ^13^C. This is true for most of the instruments currently used in LC-MS measurements.
Once the target masses are known, the experimental isotopic patterns are extracted from the MS data through the function isotopic_pattern
. Keeping on with the example of [PC 32:2 + H]^+^, this can be achieved through the command:
experimental_patterns <- isotopic_pattern(peak_table, info, mass_shift=0.05, RT=285, RT_shift=20, chrom_width=7)
In addition to the table of peaks and the list obtained above, the user also has to provide the parameters mass_shift
, RT
, RT_shift
and chrom_width
(already discussed above and further detailed in the reference manual).
The output is a matrix containing the extracted signals, with its first two columns reserved for the exact m/z and the retention times of the peaks:
experimental_patterns <- isotopic_pattern(peak_table, info, mass_shift=0.05, RT=285, RT_shift=20, chrom_width=7)
head(experimental_patterns, n=3)
Each of its columns, therefore, represents the extracted experimental pattern for that sample. To get this result, the isotopic_pattern
function does the following:
Two of the patterns extracted for [PC 32:2 + H]^+^ are shown in Figure 1: the first (to the left) is relative to an unlabelled sample, whereas the second one (to the right) is relative to a labelled sample (99% ^13^C labelling).
In this simple case, the difference is straightforward: in the labelled sample the most intense signal is shifted 40 mass units upwards with respect to the monoisotopic peak, indicating that the most abundant species is the one where all 40 carbon atoms have been replaced by the labelling isotope, ^13^C.
The extracted patterns are then analysed by the function find_abundance
, which takes each of them and finds the best theoretical pattern that reproduces it by using a weighted non-linear squares fitting procedure where the single parameter to fit is the relative abundance of the labelling isotope.
In the example, this information can be achieved through:
fitted_abundances <- find_abundance(patterns=experimental_patterns, info=info, initial_abundance=NA, charge=1)
The output is a class labelling object, which is a list containing the results of the isotopic pattern analysis:
fitted_abundances <- find_abundance(patterns=experimental_patterns, info, initial_abundance=NA, charge=1)
attributes(fitted_abundances)
The estimated percentage abundances of the labelling isotope are in "best_estimate" (numbers between 0 and 100), the standard errors from the fitting procedure are in "std_error", whereas the percentage deviations between best fitted and experimental patterns are in "dev_percent". The m/z values are in "x_scale", while in "y_exp" are their normalised intensities and in "y_theor" are those of the best fitted theoretical patterns. The differences between the two are in "residuals".
The find_abundance
function performs the following:
There are a number of ways to look at and save the results of the isotopic pattern analysis:
The generic function summary
allows to quickly glance at the estimated percentage abundances:
r
summary(object=fitted_abundances)
The generic function plot
can be used to produce three types of plots, depending on the parameter "type". By default (type="patterns") a series of plots is returned, one for each sample, showing the normalised experimental patterns superimposed to their fitted theoretical patterns:
r
plot(x=fitted_abundances, type="patterns", saveplots=F)
Two of the plots obtained in this example are in Figure 2.
If "type" is set to "residuals", the residuals are plotted:
r
plot(x=fitted_abundances, type="residuals", saveplots=F)
This is shown in Figure 3.
Finally, with type="summary", a summary plot with the estimated percentage abundances is provided (see Figure 4).
If the parameter "saveplots" is set to TRUE
, the plots are saved as a *.pdf file in the working directory.
The save_labelling
function allows to save the results to a *.csv file in the working directory:
r
save_labelling(fitted_abundances)
For each sample, this file reports: a. The estimated percentage abundance of the labelling isotope; b. The related standard error; c. The percentage deviation between theoretical and experimental isotopic patterns; d. The outcome message from the fitting procedure.
Since it is usual in biochemistry to investigate different sample groups and to have more biological replicates within the same group, the IsotopicLabelling package provides an additional function that allows to condense the results from the individual fits (carried out on a sample-to-sample basis) into a single estimate for each of the sample groups.
For each group, the estimated isotopic abundance of the labelling isotope is calculated by simply averaging the estimates for each sample in that group.
As for the standard error of the mean, this is obtained by taking into account both the individual standard errors coming from the fittings (one for each sample) and the variability of the estimates across samples. According to the law of total variance, the overall variance is given by the sum of the variances of each estimate (the "within-sample variance"), plus the variance caused by the distribution of the obtained estimates (the "across-sample variance"). The standard error of the mean is simply the square root of this amount divided by the number of samples.
This additional information can be achieved starting from the class labelling object; for example:
grouped_estimates <- group_labelling(fitted_abundances, groups=factor(c(rep("C12",4), rep("C13",4))))
The second input object (groups
) is a factor specifying the groupings. The output is a data frame containing the results for each group:
grouped_estimates
The last three columns of the data frame contain the critical value for a 95% confidence interval of the t distribution with N-1 degrees of freedom, and the lower and upper values for the 95% confidence interval.
This data frame contains all the information needed for additional statistical analysis, such as the comparison across sample groups in order to understand wheteher or not two (or more) groups have statistically different estimates.
IsotopicLabelling also makes a function available, called batch_labelling
, that allows to batch-process the MS data. This is useful when the users have in mind a list of target analytes over which they would like to run the pattern analysis.
One of the crucial parameters of batch_labelling
is a data frame containing all the required information from the user, including the list of target analytes and their chromatographic characteristics (same inputs required by main_labelling
). A more extensive explanation is reported in the manual.
Here is a practical script on how to batch-process the data. In the example, the data frame that comes with the package is used.
# Load the package library("IsotopicLabelling") # Load the xcmsSet object data(xcms_obj) # Convert the object into the required data frame peak_table <- table_xcms(xcms_obj) # Batch-process the data batch_grouped_estimates <- batch_labelling(targets=targets, groups=factor(c(rep("C12",4), rep("C13",4))), plot_patterns=FALSE, plot_residuals=FALSE, plot_results=FALSE, save_results=FALSE)
In this document we introduced the main functions of the IsotopicLabelling R package, explaining their basic working principles. Using the provided data set, we illustrated how to use the package in practice and how to quickly assess the results, with the hope that this package will prove to be a useful tool to researchers dealing with labelling experiments.
The package, in its current implementation, allows to deal with either ^2^H or ^13^C enrichments in the whole range 0-100\%. A note of caution is here necessary, though: during the isotopic pattern analysis, the package assumes that for each sample and analyte there is a single (unknown) isotopic distribution of the label. Consequently, IsotopicLabelling cannot deal with samples containing multiple sources of label abundances, such as biological samples that were spiked with a labelled analyte, or samples resulting from the pooling of several ones with different label enrichments. Therefore, pooled samples should be avoided in the isotopic pattern analysis.
After having discussed the principles behind this package, our final aim here is to provide a compact summary that may be used as a script for analysing LC-MS data relative to labelling experiments.
# Load the package library("IsotopicLabelling") # Load the xcmsSet object data(xcms_obj) # Convert the object into the required data frame peak_table <- table_xcms(xcms_obj) # Process the data fitted_abundances <- main_labelling(peak_table, compound="X40H77NO8P", charge=1, labelling="C", mass_shift=0.05, RT=285, RT_shift=20, chrom_width=7, initial_abundance=NA) # Quickly look at the results summary(object=fitted_abundances) # Plot the patterns plot(x=fitted_abundances, type="patterns", saveplots=F) # Plot the residuals plot(x=fitted_abundances, type="residuals", saveplots=F) # Plot the overall results plot(x=fitted_abundances, type="summary", saveplots=F) # Save the results to a *.csv file save_labelling(fitted_abundances) # Group the samples and obtain grouped estimates grouped_estimates <- group_labelling(fitted_abundances, groups=factor(c(rep("C12",4), rep("C13",4))))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.