In danielcanueto/rDolphin: Reliable automatic profiling of 1D 1H NMR Spectra, with additional tools for analysis

rDolphin is an R package that performs the automatic profiling of 1D 1H NMR spectra datasets and outputs several indicators of quality of quantification and identification for each signal area quantification. To perform a reliable automatic profiling, resilient to the multiple factors that can incorporate variability to a dataset, rDolphin splits the spectrum into Regions of Interest (ROIs). Each ROI has a specific method of quantification (integration or deconvolution; with or without baseline fitting), a list of signals to quantify and a list of signal parameters to monitor (e.g. the chemical $chemical_shift variability).

In addition, rDolphin comes with tools to load individual quantifications to be evaluated and updated or to perform different kinds of exploratory analyses. For researchers with non-expert programming skills, the package incorporates a Shiny GUI.

Package Installation

Please run these commands in the R console (preferably through RStudio) to install the package:

source("https://bioconductor.org/biocLite.R"); biocLite("impute"); biocLite("MassSpecWavelet") #installation of packages that cannot be installed from Github
install.packages("devtools") #installation of package necessary to install rDolphin from Github
devtools::install_github("danielcanueto/rDolphin") #installs rDolphin
library(rDolphin) #loads rDolphin

At some moment, you may be asked to install Rtools:

The installation may stop at some point because of the multiple dependencies from different repositories required to install rDolphin:

These issues tend to be solved when running the previous commands again. If not, please find the package with failed installation (in this example, MassSpecWavelet) and install it before running again the commands.

If you have problems during installation, please contact me at daniel.canueto@urv.cat

Tutorial Structure

The introduction will show how to perform in the R console the profiling of a subset of 30 spectra of MTBLS242, a Metabolights dataset of blood spectra. The 30 spectra belong to 15 patients with blood samples taken at two different times.

First a CSV file is read to import all the necessary information to perform the profiling.
Then, an exploratory analysis of the spectra dataset is performed.
Next, the quality of the ROI information is evaluated before beginning the automatic profiling.
A profiling of all spectra is performed and an output of quantifications (with associated quality indicators) and figures is generated.
Lastly, it is shown how to evaluate (and to update if necessary) the quantifications performed and how to watch the results of basic uni and multivariate analyses of the achieved quantification dataset.

A link to an online document showing how to perform similar steps through the Shiny GUI is available in the Readme file of the Github website: https://github.com/danielcanueto/rDolphin.

Download and unzip the contents of this Dropbox link.
Take a look to the three folders inside the unzipped folder. They correspond to three different Metabolights experiments:
MTBLS1: human urine.
MTBLS237: faecal extract.
MTBLS374: blood.

Each one has an input folder with the next components

file.path(system.file(package = "rDolphin"),"extdata")

You will find in the folder the next files:

A Parameters CSV file with the necessary information to import with the desired preprocessing parameters. A brief description is given inside the CSV file for each parameter.
A Dataset CSV file with a matrix with each row a spectrum and each observation a bin. The header has the information to which ppm belongs every bin. Spectra can also be inputted as Bruker experiments if specified in the Parameters file.
A Metadata CSV file with three columns:
- Sample is the name of the sample. If reading Bruker processed spectra, it needs to be the same than the one in the Bruker folder.
- Individual is a number specifying each individual. For example, in the case of this MTBLS242 dataset, the 1-15 sequence is repeated 2 times because there are two samples for each individual.
- Type is a number specifying the kind of sample.
Accurate information about Individual and Type is not necessary for the program to be run, but it will be useful to maximize the quality of exploratory analyses.
An ROI_profiles CSV file with information of every signal to be quantified. Every signal is located in an ROI that can contain one or more signals to quantify. Several parameters of the signal and the kind of quantification are also shown.

A thorough description of the structure of every file necessary to input to the tool is available here: https://docs.google.com/document/d/1t--gF5mCBNhbGvn53vKth2nlTzucLx55EfUWIgrMh_o/edit?usp=sharing

Run these commands to set as directory the extdata folder and to import the data contained there:

setwd(file.path(system.file(package = "rDolphin"),"extdata"))
imported_data=import_data("Parameters_MTBLS242_15spectra_5groups.csv")

An imported_data list is generated with several variables. Feel free to investigate the information present in each one of them. Some important ones are dataset, ppm or ROI_data.

Exploratory analysis of spectra dataset

rDolphin eases the analysis of the dataset complexity through two kinds of interactive Plotly figures:

a subset of spectra that are exemplars of the variability in the dataset:

exemplars_plot(imported_data)

the median spectrum of each kind of sample:

median_plot(imported_data)

For each kind of figure, a red trace appears below the spectra. This red trace gives the results of a univariate analysis for every bin and helps the user detect interesting regions to profile. Feel free to play with the interactivity of the Plotly figure.

If you do not know how to annotate a signal in the dataset, you can evaluate possible options in the biofluid studied (blood) ranked by probability thanks to signal repository included with the package. For example, these commands:

ind=intersect(which(imported_data$repository$Shift..ppm.<3.37),which(imported_data$repository$Shift..ppm.>3.35))
View(imported_data$repository[ind,])

help see which signals in blood matrix are seen in the 3.37-3.35 ppm region. A methanol singlet stands out in the results. Accordingly, a high-intensity singlet appears in our dataset. The filtering by biofluid avoids wrong annotations typical when using general databases.

In addition, the user can also visualize the results of STOCSY or RANSY in a region of the dataset. These are the results achieved for a glutamine signal at 2.14-2.12 ppm:

identification_tool(imported_data$dataset,imported_data$ppm,c(2.14,2.12),method='spearman')

The other important glutamine signal at 2.45-2.4 ppm stands out.

Evaluation of ROI information

Looking to the performance of profiling in a model spectrum can help to improve the parameters in some ROIs and to check additional regions of the spectrum with the potential to give interesting insights into a study.

Run the next command:

profiling_model=profile_model_spectrum(imported_data,imported_data$ROI_data)

profiling_model has three elements:

p, a Plotly figure showing the fitting applied to every ROI in the model spectrum as well as a red trace below showing the univariate analysis on each bin.
ROI_data, a data frame of the ROI data used during the process.
total_signals_parameters, a data frame with parameters of fitting and of quality of fitting (e.g. the fitting error) for every quantified signal.

Feel free to modify cells of the ROI information contained in imported_data$ROI_data or to add or remove ROIs and to repeat the model spectrum profiling process.

Automatic profiling of spectra

When you are satisfied with the ROI data contained in imported_data$ROI_data, you can perform an automatic profiling of the spectra through this command:

profiling_data=automatic_profiling(imported_data,imported_data$ROI_data)

If you do not want to wait for the completion of the profiling of the dataset, you can stop the profiling and load this .RData file, where the data of an already performed profiling session is saved:

load(file.path(system.file(package = "rDolphin"),"extdata","MTBLS242_subset_profiling_data.RData"))

profiling_data has two variables:

final_output, a list of lists that contains the performed quantifications (in quantification) as well as several indicators of quality and signal parameter information.
reproducibility_data, a list of lists where useful information of each quantification is stored, for reproducibility and profiling update purposes.

To output in your computer CSV files with the final_output data, run this command:

write_info(file.path(system.file(package = "rDolphin"),"extdata"),profiling_data$final_output,imported_data$ROI_data)

Go to the extdata folder mentioned in Import of necessary data and you will find the outputted information.

To output in your computer PDFs with plots of every quantification, run this command:

write_plots(file.path(system.file(package = "rDolphin"),"extdata"),profiling_data$final_output,profiling_data$reproducibility_data)

A plots folder is stored in the extdata folder mentioned in Import of necessary data. You can see a PDF file for every signal with all quantifications.

Validation of quantifications

The validation function allows to analyse possible wrong identifications or quantifications. For example, this command:

View(validation(profiling_data$final_output,1)$shown_matrix)

shows the fitting error in every signal quantification. And this command:

View(validation(profiling_data$final_output,3)$shown_matrix)

shows the difference between the expected chemical shift and the calculated chemical shift in every signal quantification.

If the user wants to load the information and the plot of these suspicious quantifications, he can do it through the load_quantification function. Run these commands to load the lineshape fitting of the TSP signal in the first spectrum:

loaded_quantification=load_quantification(profiling_data$reproducibility_data,imported_data,profiling_data$final_output,list(row=1,col=1),imported_data$ROI_data)
loaded_quantification$plot

One can see that the signal should have even more half bandwidth than the large one already assigned. This effect is caused by the binding of TSP to protein. If one is still interested in quantifying the TSP signal area, it seems better to perform an integration of thi signal for example from 0.1 to -0.015 ppm. Change the TSP ROI profile in 'ROI_data' to adapt it to this new fitting type.

imported_data$ROI_data[1,1:3]=c(0.1,-0.15,"Clean Sum")

And now perform the individual TSP quantification in this first spectrum through the individual_profiling function:

updated_profiling_data=individual_profiling(imported_data,imported_data$final_output,1,imported_data$ROI_data[1,,drop=F],imported_data$reproducibility_data)

And compare the results:

loaded_updated_quantification=load_quantification(updated_profiling_data$reproducibility_data,imported_data,updated_profiling_data$final_output,list(row=1,col=1),imported_data$ROI_data)
loaded_updated_quantification$plot

If satisfied with the repeated lineshape fitting, the updated_profiling_data already contains the updated quantification.

Uni and multivariate analysis of fingerprinting and profiling data

Univariate analyses of every bin could be already seen with the profile_model_spectrum function. The p values calculated can be outputted with the p_values function:

pval=p_values(imported_data$dataset,imported_data$Metadata)

However, the big advantage of profiling data to fingerprinting data is its resilience to overlapping and to chemical shift variability (typical in urine) or baseline (typical in blood). Basic univariate analyses of profiling data can be evaluated changing the data called in p_values:

pval=p_values(profiling_data$final_output$quantification,imported_data$Metadata)

Finally, dendrogram heatmaps can show us interesting subsets of samples or signals with the information provided. For example, a not identified signal can be highly correlated in quantification with another signal because they are signals from the same metabolite:

type_analysis_plot(profiling_data$final_output$quantification,profiling_data$final_output,imported_data,'dendrogram_heatmap')

For example, the heat map shows that the 'U3_61' signal correspond probably to another signal of L-valine.

And now, what do I have to do?

After this introduction to the wide range of options to perform the automatic profiling of spectra datasets and to maximize the quality of this profiling, feel free to investigate the inputs and outputs of every function.

You should investigate how to adapt the ROI profiles to the matrix studied. We share on the rDolphin website the ROI information we have found optimal in several matrixes. This ROI information will have to be adapted to the changes caused by the lab protocol during sample preparation or during spectrum acquisition and pre-processing.

danielcanueto/rDolphin documentation built on May 14, 2019, 4:03 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

danielcanueto/rDolphin
Reliable automatic profiling of 1D 1H NMR Spectra, with additional tools for analysis

In danielcanueto/rDolphin: Reliable automatic profiling of 1D 1H NMR Spectra, with additional tools for analysis

Package Installation

Tutorial Structure

Exploratory analysis of spectra dataset

Evaluation of ROI information

Automatic profiling of spectra

Validation of quantifications

Uni and multivariate analysis of fingerprinting and profiling data

And now, what do I have to do?

R Package Documentation

Browse R Packages

We want your feedback!

danielcanueto/rDolphin Reliable automatic profiling of 1D 1H NMR Spectra, with additional tools for analysis

In danielcanueto/rDolphin: Reliable automatic profiling of 1D 1H NMR Spectra, with additional tools for analysis

Package Installation

Tutorial Structure

Exploratory analysis of spectra dataset

Evaluation of ROI information

Automatic profiling of spectra

Validation of quantifications

Uni and multivariate analysis of fingerprinting and profiling data

And now, what do I have to do?

R Package Documentation

Browse R Packages

We want your feedback!

danielcanueto/rDolphin
Reliable automatic profiling of 1D 1H NMR Spectra, with additional tools for analysis