BiocParallel::register(BiocParallel::SerialParam())
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message = FALSE)
knitr::opts_chunk$set(collapse = TRUE)
library(plotly)

RHermes is an untargeted metabolomics data processing package. Its objective is to identify a customizable set (usually >10K) of metabolites in a biological or environmental sample.

This document will guide you through the package functions and show you what a typical RHermes workflow looks like.

Installation

The recommended specs for RHermes are:

You can download the development version from GitHub with:

if(!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")
devtools::install_github("RogerGinBer/RHermes")

MS1 workflow

Setting up the RHermesExp object

All the information generated by RHermes is stored in a single RHermesExp object. To generate it, start with:

library(RHermes)
myHermes <- RHermesExp()

This object is the input of most of the package functions, which return an updated version of the object.

After creating the object, set your MS1 experimental parameters. Use the function setExpParam(), which accepts either a custom ExpParam object or a template name:

myHermes <- setExpParam(myHermes, params = ExpParam(ppm = 4, res = 120000,
                                                    ion = "+"))
myHermes <- setExpParam(myHermes, template = "orbi-pos")

You can use the following template names : "orbi-pos", "orbi-neg", "qtof-pos" and "qtof-neg". Check ?ExpParam to see all customizable fields.

After setting up the ExpParam, decide which set of formulas and adducts to detect in our data with setDB():

myHermes <- setDB(myHermes, db = "hmdb") #Loads a small HMDB subset

However this "hmdb" database is just a small subset of the the real HMDB. To load a real (usually >10^3 formulas) database, use db = "custom":

#Use either a csv or a xlsx (Excel) file
myHermes <- setDB(myHermes, db = "custom", DBfile = "myDatabase.csv") 
myHermes <- setDB(myHermes, db = "custom", DBfile = "myDatabase.xlsx")

This database needs at least two columns named exactly like this: Name and MolecularFormula. You may have other columns with metadata related to the compounds, RHermes just needs those two.

A general list of adducts will be set according to the experimental parameters (positive or negative ionizzation). You can choose multicharged and multimer adducts by using the adcharge and admult parameters in setDB(). For instance, you could do:

myHermes <- setDB(myHermes, db = "hmdb", admult = 2, adcharge = 2)

And this would set your adduct list to all adducts with charge <=2 and multiplicity <= 2 (for instance [2M+H]+, [M+2H]2+ and so on).

You may also just select a subset of adducts using with adlist param:

myHermes <- setDB(myHermes, db = "hmdb", admult = 2, adcharge = 2,
                  adlist = c("M+H", "M+Na", "M+K", "M+2H", "2M+H"))

In general, the less formula/adduct combinations, the faster the program runs. It is advisable to remove very uncommon adducts from the adduct list (such as those coming from solvents not used in your particular experiment, eg. M+ACN+H). To check which adducts are selected, use adlist(). You can manually add new adduct profiles with addAd() and remove them with remAd():

#We will manually add the "M+2H" adduct
myHermes <- addAd(myHermes, name = "M+2H", deltam = 2*1.00727600, ch = 2,
                    mult = 1, toadd = "H2")

#For instance, remove adducts of unused solvents
myHermes <- remAd(myHermes, c("M+DMSO+H", "M+IsoProp+H"))
adlist(myHermes)
knitr::kable(adlist(myHermes))

Wrapping up: here's what the object looks like up to this point:

myHermes #Shows a summary of all set parameters and object info

As you can see above, the RHermesExp object keeps track of all processing steps it's gone through, as well as some other troubleshooting info like the OS and the RHermes version it was originally generated with.

Processing the MS1 files into PL

The main interface to process files is the function processMS1(), which accepts an RHermesExp object and a list of paths to the MS1 files to process. All the files must share the used experimental parameters (meaning data acquired in different polarities or in different instruments can't be processed at the same time).

For this guide, we will use a small test mzML provided with the package:

myHermes <- processMS1(myHermes,
                        system.file("extdata", "MS1TestData.mzML",
                                    package = "RHermes"))

Once done, you can check some stats about the files you've just processed:

#You could do either of these, but PL it's more direct if you just want to check
#one file:
# myHermes
PL(myHermes, 1)

SOI detection

Before diving into the process itself, we should talk first about the detection parameters; There are two ways to generate them:

It works either way, but I'd advise to just use getSOIpar() because it's easier and the templates have already been tested. There are six templates available, which only differ by the number and width of the applied bins: simple, double, triple, simple-x, double-x, and triple-x. -x meaning extended bins (15s) as opposed to the default 5s bins. It's useful to use -x if you are using a regular HPLC setup to avoid SOI splitting and to improve running time. On a UPLC setup, you can skip the -x.

So when should one use SOIParam directly? Well, it's convenient if you want to change the bin sizes used for the scan density calculation and also if you want to be stricter about the minimum required density or the minimum data point intensity (!).

Once you have your SOI parameters, pass them to findSOI and specify which files you want to use:

s <- getSOIpar("double")
myHermes <- findSOI(myHermes, s, 1)

And what about blank subtraction? It's actually really simple: just add an extra blank argument to findSOI(). You can also generate as many SOIs as you want in just one function call by passing a vector of sample IDs and blank-to-be-used IDs (keep in mind that "No blank substraction" must be represented as 0).

Since our test dataset doesn't include blank data (it's just one file), this next chunk will not be actually executed, but you get the gist of it:

#Generate a SOI list of file 2 without blank subtraction and another one
#using file 1 as blank
myHermes <- findSOI(myHermes, s, c(2,2), c(0,1))

SOI filtering

Now that you generated a SOI list, you can now filter it to remove low intensity SOIs, bad isotopic fidelity SOIs and likely in-source fragments (ISF).

For intensity and isotopic fidelity filtering we use filterSOI():

myHermes <- filterSOI(myHermes, id = 1, minint = 20000, isofidelity = TRUE)

Generating an Inclusion List

Once you have a SOI list you're satisfied with, you can generate an Inclusion List to perform the MS2 acquisition with:

To do it, write:

#Only uses SOIs of adducts in "ad". In this case, only M+H adducts
myHermes <- generateIL(myHermes, id = 1, par = ILParam(priorization = "only", ad = "M+H"))

#If there's an M+H adduct annotation, don't include other adducts.
#Only works if you've performed filterSOI first, since it requires adduct 
#annotation grouping
myHermes <- generateIL(myHermes, id = 1, par = ILParam(priorization = "yes", ad = "M+H"))

#Use all SOIs in the SOI list
myHermes <- generateIL(myHermes, id = 1, par = ILParam(priorization = "all"))

Once generated, you can plot it, filter it or export it.

If you think the IL contains too many entries in a certain RT interval, you may want to filter them like this:

#This removes all IL entries below 50000 intensity in the 0s-150s RT interval
myHermes <- filterIL(myHermes, 1, rts = c(0,150), minint = 5e4)

To export the IL entries into individual csv files (one per injection), type:

exportIL(myHermes, id = 1, file = "InclusionList", maxOver = 5, sepFiles = TRUE)

Here we plan the IL allowing up to 5 entries to be monitored by the instrument at the same time. Depending on your instrument MS2 injection time (IT), you may want to increase maxOver to between 10-15 to reduce the number of injections while keeping a good scan rate.

We are currently developing new acquisition strategies to improve the IL coverage while retaining high spectral quality and a low number of injections. Expect these previous functions to be expanded and change over time.

MS2 workflow

Once you have acquired your MS2 data, use processMS2 just like this:

myHermes <- processMS2(myHermes, id = 1,
                       MS2files = c("./file1.mzML", "./file2.mzML",
                                    "./file3.mzML", "./file4.mzML"),
                       sstype = "regular", useDB = FALSE)

Since the package doesn't come with the MS2 mzML files corresponding to the MS1 test dataset (as it would be very heavy) we have already pre-processed the IL:

myHermes <- readRDS(system.file("extdata", "exampleObject.rds",
                                package = "RHermes"))

You can extract the processed MS2 spectra with Ident():

Ident(myHermes, 1)

Saving your progress

At any point of the processing (for instance, after generating an Inclusion List), you may save your object for later use. To do so, type:

saveRDS(myHermes, "testRHermes.rds")

To load it, type:

myHermes <- readRDS("testRHermes.rds")

This RDS file you've just generated can be loaded into the GUI to continue the processing there or to easily view the plots. It also serves a a "save point" in your processing, in case you want to start again from some step

Plotting

In case you prefer to generate the plots "manually" (as opposed to using the GUI), here are some instructions for the different plotting functions:

plotPL

plotPL(myHermes, 1, "C5H11NO2", c("M+H", "M+Na", "M+K"), c(80, 120))

plotSOI

plotSOI(myHermes, 1, "C5H11NO2", c("M+H", "M+Na", "M+K"), c(80, 120))

plotIL

Bear in mind that this IL is very limited because of the small number of formulas and the filtered mzML we used.

plotIL(myHermes, id = 1)

plotRawMS2

p <- plotRawMS2(myHermes, ms2id = 1, entryid = 2)
ggplotly(p$p_bymz)
ggplotly(p$p_bygroup)

Using the RHermes Shiny app

To start the app, simply type:

RHermesGUI()

And the app should start in your default browser.

Advanced functionality

Parallelization

Another parameter you want to set up (especially if you're dealing with many formulas and adducts - say >10K formulas) is a custom parallel backend.

RHermes functions use your default backend (bpparam()), but you can set any backend you like with register(). You can use setCluster to automatically select an appropiate backend based on your OS and RAM.

BiocParallel::register(setCluster()) 

Session Info

sessionInfo()


RogerGinBer/RHermes documentation built on Nov. 6, 2022, 11:34 a.m.