knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(kableExtra)
library(DT)
library(htmlTable)
library(formattable)
library(dplyr)
#library(flexdashboard)

Data import, quality control and spectral pre-processing

 

This tutorial runs through a typical data import and pre-processing workflow for Nuclear Magnetic Resonance (NMR) spectroscopy - based metabolic profilig with the MetaboMate R package. If you no idea how to declare vectors, lists and data.frames in R, then I would recommend to have a quick look at DataCamp’s free R Tutorial first.

Let's get started by loading the MetaboMate package:

# load package
library(MetaboMate)

 

Data

Thirty raw Proton NMR (^1^H NMR) spectra of murine urine samples come with the MetaboMate package and will be used for this tutorial. Urine samples were collected in a time-resolved fashion with a single collection before and multiple collections after a bariatric surgery was performed. 1D ^1^H NMR spectra were acquired using a standard 90˚-FID pulse sequence on a 600 MHz Bruker Avance III spectrometer, equipped with a 5 mm triple resonance (TXI) probe operating at 300 K. Further information on sample collection, processing and data acquisition can be found in the original publication by Jia V. Li et al.^[Li, Jia V., et al. (2011) Metabolic surgery profoundly influences gut microbial-host metabolic cross-talk. Gut 60.9, 1214-1223.]

 

Import of NMR spectra and metadata

The readBruker() function reads Bruker NMR spectra into the R workspace. The function requires one input argument, that is the path to the computer folder that contains the desired NMR experiments. All 1D spectra in the respective directory (including all subdirectories) will be imported. Here we go:

# Specify path variable to folder containing NMR experiments
# If you would like to work with your own data, specify a different path: Within RStudio, navigate to
# the experiment folder via 'set working directory'  (Session tab) and then set path='./'
path=system.file("extdata/", package = "MetaboMate")

readBruker(path)

Please note that there is no variable assignment when the function is called, but instead the following three variables are routinely exported into the R workspace:

# create table
df=data.frame('Variable'=c('X', 'ppm', 'meta'), 
             'Description'=c('NMR data matrix (rows = spectra, columns = spectral variables)', 
                             'Chemical shift (in ppm) of each spectral variable (vector)', 
                             'Acquisition metadata (dataframe)'))

kable(df, align=c('c', 'l')) %>%
  kable_styling(full_width = F) %>%
  column_spec(1, bold = T, border_right = F) %>%
  column_spec(2, width = "30em")

Rownames of X indicate the location of individual experiments, which can further be used to match sample annotation data (e.g., group-memberships). The dataframe meta is row-matched to the NMR data matrix (X) and contains detailed information on acquisition and processing parameters, including acquistion date and analytical run-order. See the help page (enter ?readBruker() into the R console) for more information on metadata variables.

 

Visualisation of NMR spectra

Basic graphics

For the visualisation of NMR spectra there are two low-level plotting functions: spec() and matspec(), which allow plotting of a single spectrum or an overlay of multiple spectra, respectively. Both of these functions are fairly fast. Let's have a first look at all of the imported spectra over the entire ppm range.

# use 'spec' to plot a single pectrum, e.g., in row position 15:
#spec(ppm, X[15,], shift = range(ppm))

# use 'matspec' to overlay spectra, in ppm range 0 to 10
matspec(ppm, X, shift = range(ppm))

From this overview we can see that the spectral width ranges from -5 to approximately 15 ppm with the residual water signal resonating around 4.8 ppm. All ^1^H NMR signals are situated within the ppm range of 0 and 10, therefore both ends can be capped and this will be illustrated further below.

ggplot2

There are different higher-level plotting functions available that allow a more comprehensive spectral visualisation of individual peaks or spectral ranges. One function is specOverlay(), which is comparable to matspec() shown above, but enables to create different subplots (aka facetting) and to include colour scales and linetypes in a straightforward way. The order in which the plotting parameters are specified is the same for all functions of this type: an=list(facet, colour, lineype), whereas each list element is a vector with as many elements as there are spectra in X.

For illustration purposes, let's plot the TSP signal (a singlet present in all spectra) located in the ppm range of -0.05 to 0.05. Added is information on acquisition paramters that can be found in the meta dataframe: the different panels (facetting) show different experiment types, the colour maps to the analytical run order (a_RunOrder) and the linetype indicates the NMR pulse program (a_PULPROG):

# plot TSP signal
specOverlay(X, ppm, shift=c(-0.05,0.05), 
    an=list( 'Facet'=meta$a_EXP, # facet
             'Run Order'=meta$a_RunOrder, # colour
             'Pulse Program'=meta$a_PULPROG) # linetype
    ) # linetype

The plot above shows that three different experiment types were performed (i.e., three different panels), all based on the pulse program \<noesypr1d>, which a Bruker term for a standard 90˚ pulse experiment with water pre-saturation. Experiment types labelled \<> and \<OB_flowtest> are calibration experiemnts and were performed before type \<JL-noe1D> (see colour scale, i.e., run order).

Spectral referencing

Spectral referencing refers to horizontally shifting an entire spectrum unitl a signal of a standard compound reaches a defined chemical shift position. In urine metabolic profiling, the standard compound is TSP^[3-(trimethylsilyl)-2,2′,3,3′-tetradeuteropropionic acid] (usually added to the sample buffer), which gives rise to a singlet that is defined to resonate at 0 ppm.^[Dona, A.C., et al. (2014) Precision high-throughput proton NMR spectroscopy of human urine, serum, and plasma for large-scale metabolic phenotyping. Analytical Chemistry. 86.19. 9887-94.]

We can reference the urine spectra using the calibrate() function, as shown with the following code:

# calibrate urinary NMR spectra to TSP
X.cali=calibration(X, ppm, type='Urine')

# plot TSP overlay with calibrated data, facetted for each different experiment type
specOverlay(X.cali, ppm, shift=c(-0.05,0.05), 
    an=list( 'Facet'=meta$a_EXP, # facet
             'Run Order'=meta$a_RunOrder, # colour
             'Pulse Program'=meta$a_PULPROG) # linetype
    ) # linetype

Now you can see that for all spectra, the TSP peak apex is centered at a chemical shift of zero ppm (that was not the case before).

 

Filtering based on spectrometer parameters

For statistical analysis we are only interested in \<JL-noe1d> experiment type, as the other calibration experiments are not suitable for quantitative analysis. To exclude the latter, execute the following R code which also re-runs the previous plotting command:

# Exclude calibration experiments
idx=grep('noe1d', meta$a_EXP) 
X=X[idx,]
meta=meta[idx,]

# plot TSP signal
specOverlay(X, ppm, shift=c(-0.05,0.05), 
    an=list( 'Facet'=meta$a_EXP, # facet
             'Receiver Gain'=meta$a_RG, # colour
             'Pulse Program'=meta$a_PULPROG) # linetype
    ) # linetype

 

Assessment of spectral quality

In metabolic phenotyping and in any other field where multiple NMR spectra are compared quantitatively, the quality of spectra is of particular importance. Basic spectral quality assessement includes visual inspection of the water suppresion quality, spectral line widths, baseline level and stability as well as the average signal to noise (SN) ratio.

High throughput NMR often does not allow a manual inspection of each individual spectrum and automatically generated quality indices are used to exclude low quality spectra. The spec.qc() function derives several spectral quality control indices and produces an overview plot if the function argument plot is set to TRUE:

# calculate quality control measures
spec.qc=spec.quality(X.cali, ppm, ppm.noise=c(9.4,9.5), plot=T)

The output of this function is a dataframe containing the quality control indices for each sample:

The latter two paramters are primarily influenced by biological factors. For example, the area of the baseline depends on sample dilution (higher diluted samples have lower signals intensities, which leads to smaller baseline areas). However, these can still be useful for spotting extreme outliers of biological as well as technical nature.

The first few rows of the quality indices table are shown just below. Like for the NMR matrix X, the row names are the experiment folders.

kable(head(spec.qc, 4)) %>%
kable_styling(bootstrap_options = "striped", full_width = T, position = "float_left")
#knitr::kable(head(spec.qc, 6))
#library(DT)
#datatable(head(spec.qc, 6), rownames = FALSE, filter="top", options = list(pageLength = 5, scrollX=T) )

Currently, the line width estimate of the spec.quality() function (TSP.lw.ppm) requires a TSP signal or any other external reference compound that resonates at zero ppm. The respective line width in Hertz (Hz) can be derived by multiplying TSP.lw.ppm with the spectrometer frequency in the meta dataframe:

# TSP line width in Hz
TSP.lw.Hz=spec.qc$"TSP.lw.ppm" * meta$"a_SFO1"
hist(TSP.lw.Hz, xlab='TSP line width (Hz)', main='Histogram', breaks = "FD", xlim=c(1.5,3.5))

 

Excision of chemical shifts regions

Further downstream analysis requires the excision of chemical shift regions where signals from external sources (e.g. buffer components) are present or are non-quantitative. In urinary NMR analyses these include the TSP signal, the residual water and urea signal as well as ppm regions where there is no signal but only noise.

The function get.idx() can be used to obtain indices of the desired shift range from the ppm vector. These indices can then be further used to exclude the relevant columns in the NMR matrix and ppm vector. This is illustrated in the following code snippet.

# Indexing TSP region and upfield noise...
idx.TSP=get.idx(range=c(min(ppm), 0.5), ppm)
# ... water region...
idx.water=get.idx(range=c(4.6, 5), ppm)
# ... as well as downfield noise regions
idx.noiseDF=get.idx(range=c(9.5, max(ppm)), ppm)

# Exision of TSP, res. water and noise regions
X.cali=X.cali[,-c(idx.TSP, idx.water, idx.noiseDF)]
ppm=ppm[-c(idx.TSP, idx.water, idx.noiseDF)]

 

Baseline correction

In 1D 1H NMR spectroscopy using simple pulse sequences, broad resonanes originating from macromolecules lead to increased spectral baselines. Baseline differences across spectra complicates the analysis and the the removal of often leads to more accurate results. In MetaboMate, a non-linear baseline correction can be performed with the bline() function. See the exmpale below for its input arguments and ?bline for methodological information.

# Baseline correction
X.bl=bline(X.cali)

# compare spectra before and after baseline correction
specOverlay(X = X.cali, 
            ppm = ppm, 
            shift=c(3,4), 
            an=list(
              panel="Not BL corrected", 
              "SN ratio"=spec.qc$"SN.ratio"),
              title="Raw"
            )

specOverlay(X = X.bl, 
            ppm = ppm, 
            shift=c(3,4), 
            an=list(
              panel="Not BL corrected", 
              "SN ratio"=spec.qc$"SN.ratio"),
              title="Baseline corrected"
            )

 

Spectral normalisation

Depending on the sample type, spectra may require normalisation prior to statistical analysis. For example, urine dilutions vary across samples, perhaps due to the uptake of different amounts of water. Normalisation methods can account for these kind of systematic differences in spectral intensities.

There are several normalisation methods avaiable. Among the most commonly applied ones are Total Area (TA) normalisation and Probablistic Quotient Normalisation (PQN).^[Dieterly, F., \emph{et al.} (2006), Probabilistic Quotient Normalization as Robust Method to Account for Dilution of Complex Biological Mixtures. Application in 1H NMR Metabonomics, \emph{Analytical Chemistry}, 78.3, 4281-90] In this tutorial, we normalise the spectra using both methods and compare the results. Therefore, the functions pqn() and totaArea() are called, both returning the normalised NMR matrix. The additional input argument add.DilF indicates if the calculated dilution factors should be exported to the R workspace. Having a look at the calculated dilution factors can be very informative. For example, if pooled quality control samples were periodically included in the study run, then these should have very comparable dilution factors. The add.DilF argument specifies a user-given variable name and a vector variable named like this (containing the dilution factors) is automatically added to the R workspace after calling the function. Here's an example:

# PQN normalisation
X.pqn=pqn(X.bl, add.DilF = 'dilF.pqn')

# Total area normalisation
X.ta=totalArea(X.bl, add.DilF='dilF.ta')

# Dilution method comparsion using statistical dilution factors 
plot(log10(dilF.pqn), log10(dilF.ta), xlab='PQN Dilution Factor', ylab='Total Area (scaled)', xlim=c(-1,1.5), ylim=c(-1,1.5))
abline(a = c(0,1), col='green')

Plotting the dilution factors of both normalisation factors against each other, shows that these are genearlly lower for PQN than for TA normalisation. Which normalisation method is more appropriate generally depends on the sample type, experimental setup and analysis platform. In NMR-based untargeted metabolic phenotyping studies using urine as a sample matrix I recommend using PQN instead of TA normalisation.

The final step in this preprocessing tutorial is the visual inspection of the pre-processed NMR spectra:

matspec(ppm, X.pqn, shift = range(ppm))
matspec(ppm, X.pqn, shift = c(2,4))
matspec(ppm, X.pqn, shift = c(4,6))
matspec(ppm, X.pqn, shift = c(6,9))

 

Summary and further steps

Raw 1D NMR spectra were imported, quality checked and pre-processed which included referencing to TSP, baseline correction and excision of spectral areas bearing no quantitative biological information. Further, the urine-derived spectra were normalised with PQN to account for different sample dilutions. Further filtering steps could be applied based on the established quality indices (eg., using TSP line widths as an estimate for shimming quality).

Spectra are now optimally prepared for statistical analysis. Commonly applied analysis methods in metabolic profiling include Prinicpal Component Analysis (PCA) and Orthogoanl-Partial-Least Squares (O-PLS). You can find more information on this in the vignette Multivariate Statistical Analysis of the MetaboMate package.



kimsche/MetaboMate documentation built on Aug. 8, 2020, 1:14 a.m.