knitr::opts_chunk$set(echo = TRUE)
htmltools::tagList(rmarkdown::html_dependency_font_awesome())
htmltools::tagList(rmarkdown::html_dependency_bootstrap("default"))


#library(Metaboseek)
print(paste("Metaboseek version:",packageVersion("Metaboseek")))

Metaboseek Documentation

Metaboseek offers a graphical user interface to set up data analysis with the xcms package to detect and align molecular features from LC/MS data across multiple samples. You can then load xcms results into the app as a "Feature Table" (using xcms and MSnbase packages, mzR-based) and run statistical analyses to identify molecular features of interest.

This document describes all UI elements in Metaboseek and is meant to be a comprehensive user manual.

Install Metaboseek

System Requirements

Recommended minimal system requirements:

We recommend computers with a monitor with at least full HD (1920 x 1080 pixels) resolution. You can use the zoom function of your web browser to scale the interface to your liking.

All files are loaded into memory, so that browsing will be very quick: It is easy to look at extracted ion chromatograms (EICs) for many MS features of interest across dozens of files within a fraction of a second. However, the initial loading of the data will take some time, and you may experience issues if you load many files at a time. We strongly recommend using centroided data files, as they will have a smaller memory footprint. Loading 50 data files from 20-minute high resolution LC/MS data acquisition should not be a problem on a computer with 16 GB of RAM.

Java {#javainfo}

If installed from an R session, Metaboseek will require Java to be installed for full functionality (in particular, molecular structure plotting in the SIRIUS module. Java is also a requirement for installing SIRIUS itself). Make sure to install 64-bit Java if you are running 64-bit R (which is most likely), or 32-bit Java if you are running 32-bit R. If you go to java.com and follow the download buttons there, it will send you to download the version that corresponds to your browser (32- or 64-bit) by default, which may or may not be the version you need. Get the appropriate Java version from this page: https://www.java.com/en/download/manual.jsp.

Install on Windows

Using the Installer

  1. Download the installer of the most recent release version here
  2. Follow the installation steps.
  3. Metaboseek 0.9 should now be installed and can be launched like any other Windows program. When Metaboseek launches, a command line window will appear, and the user interface will open up in your default web browser. To close the program, close the Metaboseek command line window.

The installer version of Metaboseek has one limitation: it does not plot molecular structures for predicted structures in the SIRIUS module. This is a compromise made so that this installation of Metaboseek does not require Java to be installed on your system.

Using a .zip File

  1. Download the .zip file of the most recent release version here
  2. Unzip the file on your computer (this may take a while!)
  3. Run Metaboseek by executing runMetaboseek.exe In the unzipped folder.

The .zip version of Metaboseek has one limitation: it does not plot molecular structures for predicted structures in the SIRIUS module. This is a compromise made so that this installation of Metaboseek does not require Java to be installed on your system.

Install on Mac / Linux

Consider getting the Metaboseek Docker image, or follow these steps to install Metaboseek:

  1. Download.
  2. Mac users: Get Xcode by entering this line into your Terminal window:
xcode-select --install
source("http://metaboseek.com/files/install_Metaboseek.R") 
Metaboseek::runMseek()
remove.packages('rcdk')

Then try to run Metaboseek again.

Get the Docker Image {#Docker}

As they put it on their website, "Docker provides a way to run applications securely isolated in a container, packaged with all its dependencies and libraries.". This is also a convenient way to reproduce analysis results that were generated with a particular version of Metaboseek. Once you have set up Docker on your computer, this is the easiest and most reproducible way to get fully functional Metaboseek, including SIRIUS integration.

  1. Install and set up Docker. Please note that there are limitations for Windows users (Windows 10 Pro is required and using Docker prevents running Virtual Machines with VM VirtualBox).
  2. You can now get the Metaboseek Docker image using this terminal command:
docker pull mjhelf/metaboseek

The metaboseek Docker image is based on the bioconductor/release_metabolomics2 image. 3. Running this command will execute the latest version of the Metaboseek container (and download it if not already available on your computer):

docker run -d -v HOSTFOLDER:/home/shiny/data -p 3840:80 -e PASSWORD=YOURPASSWORD mjhelf/metaboseek

Lets take a look at some key settings here:

docker run -d -v /home/user123:/home/shiny/data -p 3840:80 -e PASSWORD=YOURPASSWORD mjhelf/metaboseek

NOTE: The apps hosted inside the container will be accessible from the internet (for anyone connecting to your computer's IP address and the correct port number). By default, they will be protected by HTTP basic authentication, but that is not 100% secure. Once authenticated, the apps allow seeing the data structure of the specified HOSTFOLDER, and it is possible to download arbitrary .csv files and MS data from that folder. We are not liable for any data exposure to unauthorized parties or other damages.

All contents of the /home/user123 folder will be acessible in Metaboseek. * -p 3838:3838 means that port 3838 from the container will be accessible as port 3840 on the host computer.

docker ps
  1. Go to your web browser and go to the website localhost:3840, where the port number after the colon may differ based on your -p setting (see above). By default, you will have to log in, with the username metaboseek and the password you specified (YOURPASSWORD in our example). This will open a website, hosted inside the metaboseek container. Select the app you want to run and analyze your data!

Experienced R users (Windows, Mac or Linux):

If you have installed R (and the devtools package) already, you can install Metaboseek like this:

devtools::install_github("mjhelf/MassTools")
devtools::install_github("mjhelf/Metaboseek")

If you want to make sure you get all the required packages, run the install script with this line:

source("http://metaboseek.com/files/install_Metaboseek.R") 

Use the web version

If you have trouble installing Metaboseek and want to just try it out with an example dataset, use the web version.

Data Analysis with Metaboseek

With Metaboseek, you can quickly visualize data from batches of high-resolution LC/MS data files and find differences between groups of samples. It is not necessary to do any analysis before looking at your data, but a typical workflow starts with a data analysis step:

Then, you can use Metaboseek to browse the data, find molecular features of interest, predict the molecular formula and make structure predictions based on MS2 data.

Overview

Metaboseek is structured into two major sections, the Data Explorer section for visualization and statistical analysis tasks, and the XCMS analysis to identify LC/MS features in MS data files. You can switch between these sections with the navigation menu on the left of the screen.

Navigation Bar Items{#NavbarItems}

The buttons in the navigation bar either help with the user interface, or allow quick access to important functionalities. The Navigation bar, always showing up at the top of the Metaboseek interface{width=100%}

Interface Buttons

The leftmost Menubutton can be used to hide the navigation bar on the left side, allowing you to maximize screensize. Likewise, you can use the Fullscreen button to maximize the size of the browser window.

Functional Buttons

Load MS Data, Feature Tables or Sessions

The Load button allows you to load MS data, feature Tables and entire Metaboseek projects, as detailed here..

Save Session

Use the Save button to save the current Metaboseek session. This will save all Feature Tables, Molecular Networks and MS data files that you have loaded into the current session. You can choose to include the MS data in the session file (e.g. for simple sharing of an analysis with colleagues). However, this will increase file size significantly and may slow down the saving process. If MS data is not included in the saved session, Metaboseek will expect the MS data files to be in the same location when you load the session.

Global Options

Settings available :

Start Page / Loading Data {#loadMSData}

The Metaboseek start page with data loading options and update news{width=100%}

The Start page provides you with information about the newest version of Metaboseek, and also allows you to load data into Metaboseek. You can also click on the Load icon on the left side of the navigation bar at the top of the page to get the same set of options for loading data:

Load Feature Tables

You can load any .csv or .mskFT file into Metaboseek. You can then go to the "Regroup Table" tab to specify or change the columns that contain intensity values. Feature Tables contain the results from feature detection with xcms, along with results from statistical analysis. If you load an .mskFT file, important metadata, such as processing history and sample grouping are loaded along with the result table. If you have loaded a project folder into the current session, there is a convenient option to select all compatible table files from the project folder as well.

Loading MS Data Files Directly.{.unnumbered}

All files with supported file extensions in the selected folders and all its subfolders can be imported, either by selecting files individually (selecting multiple files at a time is possible), or by importing an entire folder that contanis MS data (will import all compatible files from all subfolders, too). To save time, it makes sense to pre-sort your files in a reasonable folder structure (e.g. separate positive mode data from negative mode so you don't get both kinds when selecting a folder to load into Metaboseek). Loading MS data files after you have already loaded a project folder allows you to visually inspect files that you had excluded from the xcms analysis, such as blanks.

Load a Metaboseek Project Folder.{#projectFolders}

When you run xcms through Metaboseek, the program generates a project folder that contains the results from that xcms analysis run, and all settings that were used in it. In addition, all output feature tables you requested will be saved in the project folder during the xcms run. You can load this result folder into Metaboseek, making it easier to keep all analysis results related to this xcms run in one place.

You can either select a project folder anywhere on your computer, or select a project folder from the recent project selection window that lists the most recently used project folders (load the selected folder with the Load Recent button). If you chose to load a project folder, all MS data files from the xcms run will be loaded and sample grouping information from the xcms analysis will be applied. Metaboseek will ask you which feature table you want to load from the project folder. If you select an .mskFT file (recommended) instead of the corresponding .csv file, you will benefit from the additional information embedded in these files. .csv files are primarily there for export and viewing in other tools (and even Microsoft Excel), while .mskFT files are designed to be loaded back into Metaboseek. The advantage of .mskFT files is that they contain the complete processing history (including settings used for the xcms run, CAMERA analysis and post-processing). .mskFT files are technically .RDS files containing an MseekFT object and can be loaded into any R session with the readRDS() function.

Load Example Data

You can select "example_projectfolder" from the "Recent projects" selection box and click on "Load recent". Metaboseek will ask you which table you would like to load into the session along with the MS data that is associated with the exammple project folder.

Load a Metaboseek Session

You can load a Metaboseek session that you saved previously in an .msks file. This will restore all feature tables and MS data files you had loaded into that session along with many of the layout settings. Note: This will currently only work if the MS data file locations have not changed from the paths used in the old session. Some aspects of the session will not be restored (notably, molecular networks are not saved in the session file).

Supported File Types {#supportedFiles}

Metaboseek uses the MSnbase and xcms packages to load MS data files of the following formats.:

Note: Data needs to be centroided.

Feature Tables can be loaded in these formats:

Data Explorer

Sirius Options{width=100%}

At the heart of Metaboseek is the interaction between data visualization in the "Data viewer" box, and a table of LC/MS data features in the Feature table box

Options Box {#OptionsBox}

This box provides a number of optional functionality, including setting up SIRIUS, calculating molecular formulas and controling the appearance of extracted ion chromatograms (EICs) in the Data Viewer box.

Sirius Options

Sirius Options{width=100%}

The settings here are passed on the the SIRIUS executable. Please have a look at the SIRIUS documentation to learn more about them.

Molecular Formula Prediction

Molecular formula prediction{width=100%}

In this Tab, you can calculate molecular formulas that match the currently selected feature's m/z value. All settings are passed to the calcMF function from the MassTools package. Molecular formulas are generated with the Rdisop package and can then be filtered using the rules proposed by the "Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry" (@Kind2007), as well as some additional filters. For detailed documentation, click here.

RT Correction

RT correction{width=100%}

If you load a Project Folder from a finished xcms job that included retention time correction, you can review the effect of retention time correction across your files here. Retention time is plotted on the x-axis for each file, and deviation from the uncorrected retention time is shown on the y-axis. Very large RT deviations or very different behavior between groups of samples can point to problems with your chromatography setup or retention time correction settings.

Mass Shifts

Mass shifts{width=100%}

You can define mass shifts that will be shown in the Data viewer -> Grouped EICs window as additional EIC traces (in dashed lines). Mol_formula and charge columns are currently ignored. Click Update mass shifts to update the EIC view and to save your edits to the mass shift table (will be restored in your next session).

EIC Options

EIC options{width=100%}

This allows control of various formatting options for the EICs in Data viewer -> Grouped EICs as well as Data viewer -> MS2 Browser -> Feature Report, and in part also for Data viewer -> MS Browser

Data Viewer

This box provides plots of data that is selected in the Feature Table box. Different kinds of data plots and browsing options are available, mostly for extracted ion chromatograms (EICs) and spectrum plots, but also bar plots, venn diagrams and plots from principal component analysis.

MS2 Browser

The MS2 browser{width=100%}

If you have loaded MS data files which contain MS2 (tandem-MS) data, you can go to the MS2 browser for a variety of data analysis options specifically for MS2 data. The MS2 Browser box is the most complex tab in Metaboseek, and its user interface is divided into three parts: The two sub-tabs Feature report and Compare MS2, and a bottom part that is always visible, independent of which sub-tab is selected. This General MS2 Browser part includes the SIRIUS module and the list of MS2 scans associated with the selected row in the Feature Table.

Sub-Tabs

You can switch between two views in the MS2 browser:

Feature Report{.unnumbered}

The Feature Report sub-tab is designed to show all information about a molecular feature at one glance and make it exportable as a single page .pdf document. This includes Grouped EICs at the top (see "Grouped EICs" description for description of the controls for this). If MS2 data is available for a molecular feature selected in the Feature Table, MS1 (left) and MS2 (right) spectra are also shown below the EICs.

You can generate single page reports for a feature, including EICs, MS1 and MS2 spectra and SIRIUS results.{width=100%}

Compare MS2{.unnumbered}

In this sub-tab, you can compare MS2 spectra with each other. On the left side, you see space for the molecular network viewer. MS/MS spectra are shown on the right side.

Molecular Network Module{.unnumbered}

MS2 scans are averaged for each molecular feature, and then the averaged spectra are compared with each other. In the next step, you can select the parameters used for spectrum sililarity calculations. Only peaks that match within the m/z and ppm tolerance between two spectra will be used to calculate the similarity score, and peaks at an intensity below a set percentage of the maximum peak intensity in a scan (Noise level in %) will be excluded. You can also ignore small fragments, an experimental feature that will exclude peaks with m/z < 100 m/z from the spectra, which can be used for instance to exclude phosphate peaks which can be very dominant in negative mode data. The similarity score will be considered 0 if less than min. peaks match between two spectra.

settings for Network generation{width=70%}

The intensities of the matching peaks for each spectrum are extracted, an the similarity score is calculated as the cosine between these intensity vectors, i.e. a simlar relative intensity distribution of intensities is expected for the matching peaks for similar compounds. If you select Use parent masses, neutral losses are also used for peak matching, which will increase the number of matching peaks for compounds that have different parent masses (e.g. because of a methyl group or adduct difference). This step can take minutes or even hours, depending on the number of molecular features with MS2 scans that are compared to each other. For more details, look at the documentation for the makeEdges() and network1() functions from the MassTools package.

Finishing the {width=70%}

After calculation of the similarity scores, you can give your new network a name and select a threshold for which comparisons to keep (above a given Cosine threshold). Stricter (higher) values generate less data and less complicated networks, generally with less netowork clusters. A less strict (lower) Cosine threshold will keep more of the comparison information which you can remove later with the Simplify network button:

You can remove edges (connections) from the network to only see the most significant connections{width=70%}

You can save networks in either the .graphML or .mskg format.

Using the network viewer{width=100%}

You can move nodes by dragging them with your mouse while holding the CTRL key (this helps make all labels visible in a dense network). Return back to the network overview by double clicking on the graph. If double-clicking does not work, you can also zoom out by clicking while holding the Z key.

The processing history for the current network can be viewed with the History button.

Mapping to reference{width=100%} "Match Feature Table" is an experimental beta feature: You can map the current Feature Table on the currently active MS2 network, re-using the network layout. This is still in development and will change over time.

Compare Spectra{.unnumbered}

In the "MS2 spectra" box on the right, you can choose to keep a spectrum view - it will then not be refreshed when you select a new Feature table entry or network node. Instead, a new spectrum plot will show up below. You can show up to 5 spectrum views at the same time. By default, all peaks that occur in more than one of the shown spectra are highlighted in blue. You can disable this comparison with the Compare checkbox. You can also download the shown spectrum views in .pdf format by clicking Download spectra, or in .tsv format (Save as table).

General MS2 Browser

Below the sub-tab selection, you can see these elements:

SIRIUS Module{.unnumbered}

SIRIUS (@Duhrkop2019) is a stand-alone software developed in the Boecker lab at the University of Jena that can use MS/MS data to predict the molecular formulas of fragment and parent ion peaks. It also offers an interface to CSI:FingerID to match fragmentation patterns with structure databases.

When you first go to the MS2 Browser, this is what it looks like{width=100%}

Information about completed SIRIUS analyses will show up here if available for the active molecular feature from the Feature Table.

Get Structure Predictions with SIRIUS

MS2 data can be analyzed with SIRIUS from inside the Metaboseek app. All settings for SIRIUS can be found in the Options box. In the Sirius options, you first need to tell Metaboseek where the SIRIUS executable is located ("SIRIUS folder"). Metaboseek will generate a new folder there to store results from Sirius runs. NOTE: Make sure you have write access to the SIRIUS location.

To run Sirius, use the "Run SIRIUS" Button above the MS2 scan table. Make sure to select appropriate options in the Sirius options section at the top of the app Options box. The results can be accessed through Metaboseek as soon as a Sirius analysis run finishes by clicking "Show SIRIUS" in the Spectra list. Select items in the tables that show up to view fragmentation trees and proposed structures. Two buttons for SIRIUS are in the Spectra list section below: The Run SIRIUS button will use the currently selected spectra with the current Sirius options to run a SIRIUS analysis. This will typically take a few seconds. The Show SIRIUS button will show SIRIUS results for the selected MS2 spectra when available. The color of the button indicates if SIRIUS results are available (green), not available (red), or available with settings that differ from the current settings in Sirius options (yellow).

You can select molecular formulas from the SIRIUS result table on the left to display the corresponding fragmentation tree. The annotated fragments will also be highlighted in the Feature report subtab MS2 spectrum view. If you selected Get FingerID in the Sirius options, a list of candidate molecules will show up on the right side. Select one to view the molecular structure. NOTE: Viewing molecular structures requires installation of the rcdk package, which is not included in the Metaboseek Windows installer, and not automatically installed when installing Metaboseek from R.

Click on the Browse SIRIUS searches section to show a list of SIRIUS jobs. Select a job here to look at SIRIUS results independent from the current Feature Table selection.

When you first go to the MS2 Browser, this is what it looks like{width=100%}

Spectra List{.unnumbered}

Spectra list{width=100%}

When you select one (or multiple) entries in the Feature Table, Metaboseek will find any MS/MS scans that have a parent mass matching the selected Feature Table entry (e.g. within 5 ppm and 200 seconds, customizable). All MS/MS scans matching a selection (from a network or from the Feature Table) are shown in a table in the MS2 browser tab.

You can define the parent ion m/z tolerance (in ppm) and retention time window at the top, allowing you to only show MS2 scans that are within these tolerances from your selection in the Feature Table. You can also sort this table with the controls at the bottom of the table. An average spectrum of all scans shown in this scan table is displayed on the left. You can select single or multiple scans in the scan table to show the spectrum of only the selected scan(s). The MS2 scans selected here are also used and displayed by both, the Feature Report and Compare MS2 sub-tabs.

PCA Viewer

Shows interactive plots with results from the principal component analysis (PCA) from the Feature Table Actions Analysis Options if available.

Venn Diagrams

This module allow you to filter the Feature Table in up to three different groups and show the number of overlaps between the groups. You can define the grouping by applying up to three different filters to the current Feature Table. The filters work like in the Filter Table Tab

Quickplots

In this tab, you can view the data in summary plots. The left side uses the intensity values from the feature table as input, while the right side allows you to plot arbitrary Feature Table columns against each other.

MS Browser

Here, you can select individual files to show their EICs for the selected feature or a custom m/z value. You can use SHIFT + click to select a data point to display the corresponding MS1 spectrum below. See the "Navigating plots" section for more information on how to interact with the spectrum and EIC plots.

You can display multiple independent EIC views at the same time. Each of them has these settings:

Other settings for the EIC plots, such as mass tolerance and color palette, can be changed in the EIC options in the Options box and will apply to all EIC plots in Metaboseek.

Grouped EICs{#GroupedEICs}

Similar to the MS Browser (see above), but enabling different layouts of grouped EICs. Some plotting parameters can be changed in the EIC options in the Options box, and some can be changed here directly:

Regroup MS Data

You can group the MS data independently from the grouping in the Feature Table. This grouping can be used to define color schemes or which files should be plotted together in Grouped EICs. It is possible to assign each file to two different groups to allow switching plot layouts using EIC options in the Options box. You can define multiple grouping schemes here with the 'new Grouping' and 'Update Grouping' buttons and switch between these schemes from the Grouped EICs Tab.

Feature Table

The Feature Table{width=100%}

This box contains the most important element in the app: the Feature Table. Most plots in the Data Viewer will use this table as input to show you information that is related to the molecular feature that is defined in the selected row.

Feature Table Actions

In this box you can run analyses on the currently selected Feature Table and filter it.

Special Columns in the Metaboseek Feature Table{#columnExplain}

Some column names and name schemes are generated by the actions you can take in the Analyze Table Tab. You can use these columns to filter your Feature Table in the Filter Table Tab.

coldf <- read.csv("assets/columnLegend.csv")

colnames(coldf) <- c("Column", "Description", "calculated by", "method")

knitr::kable(coldf[,1:3])

Filter Table{#FilterTable}

Filter Table{width=100%}

You can filter the Feature Table here by specifying a column and filter criteria. Columns containing text can be filtered for text patterns, and numeric columns for values within a range. You can define an arbitrary number of filters and it is possible to activate or deactivate individual filter steps. IMPORTANT: when you save a Feature Table, the currently active filters will be applied before saving.

Analyze Table

Filter Table{width=100%}

The Analyze Table tab is the central hub for data analysis on your Feature Table. Most analysis steps will generate new columns in the Feature Table which you can then use to filter your table to get to your features of interest. See below for a guide to the columns generated by the analysis steps.

Analysis Options{#AnaOptions}

For more in-depth information on the underlying functions in R, see the Metaboseek::analyzeFT documentation.

Basic Analysis
Advanced Analysis{#AdvancedAnalysis}

For the Labelfinder, follow these steps:

  1. Run two xcms analyses independently for the labeled and the unlabeled samples.

  2. Load the results from both analyses into the Metaboseek session (potentially use the renaming functionality in the Feature Table box to keep track of which results come from the labeled and unlabeled samples).

  3. Make sure to also load all MS files into the session, for both labeled and unlabeled samples.

  4. Select the unlabeled Feature Table as active table in the Feature Table box

  5. Open the Labelfinder dialog and select the labeled sample feature table.

  6. Read the tooltips on the settings for explanations on the individual settings. You can deselect samples from both the labeled and unlabeled feature tables if necessary

  7. Press Go to start the Labelfinder analysis. This will generate a new Feature Table with likely labeled compounds using the the selected name (by default has "Labelfinder_" as a prefix). The unlabeled features will be reported in the resulting table.

  8. To browse the results, you can add the label m/z of interest to the Options -> Mass shifts. This will allow you to see overlays of EICs for the labeled and unlabeled compounds. Note that you may have to manually load additional raw files (e.g. those for the labeled samples) to display all relevant information.

Click here for details on the Labelfinder algorithm

The findLabels() function compares two Feature Tables with each other, assuming that one of them contains an enrichment of labeled compounds.

In a first step, featlistCompare() is used to identify entries in the reference (unlabeled) Feature Table which have a corresponding, labeled feature in the comparison (labeled) Feature Table (m/z in comparison Feature Table should be within tolerance of reference m/z + expected label and also within retention time tolerance).

Each entry from the reference Feature Table (dubbed I1S1, for Isotopologue 1, Sample Group 1) can have multiple matches in each of these categories: 1. m/z + label match in reference table (I2S1) 2. m/z match in comparison table (I1S2) 3. m/z + label in comparison table (I2S2)

For each match, only the match closest in retention time to I1S1 is kept for further processing. Intensities are re-extracted for all matched peaks (I1S1, I2S1, I1S2, I2S2), using the m/z values identified for I1S1 (for I1S1 and I1S2) and I2S2 (for I1S2 and I2S2), and the rt values for I1S1 (for I1S1 and I2S1) and I2S2 (for I2S2 and I2S2). The extracted intensities are used to calculate mean intensity across the unlabeled (S1) and labeled (S2) samples for both isotopologs.

Key filter criteria that are user-controlled are the minimum ratio of I1S1/I2S1 (because a high ratio is expected in the unlabeled sample S1 where the unlabeled compound I1 is expected to be more abundant than the labeled compound) and the maximum ratio of I1S2/I2S2 (where a low value is indicative of the label being enriched). The Features from the reference Feature Table which meet the filter criteria are then exported to a new Feature Table that contains intensity information for I1S1, I2S1, I1S2 and I2S2. The reported m/z and rt values are directly carried over from the original reference Feature Table.

Click here for details on the peak detection algorithm

the peakDetect() function uses a modified version of an algorithm presented by Ma et al.24 as follows:

For the global noise level, let $N $ be the number of EIC data points, and $S_{i}$ the intensity value of the $i^{th}$ data point. $K$ is a user definable variable. $GlobalNoiseThreshold = (GlobalMaximum + GlobalAverage)/100 + K * Deviation$ where : $GlobalAverage = \displaystyle \frac{\sum_{i=1}^N|S_{i}|}{N}$; $Deviation = \displaystyle \frac{\sum_{i=1}^N|S_{i} - GlobalAverage|}{N}$ In addition to the global noise threshold, a local noise threshold is calculated for each data point $S_{i}$ in the EIC, using a similar equation limited to a small retention time window around $S_{i}$. Let $n$ be the number of scans to consider for local noise level calculation in each direction, and $noise_{i}$ the local noise level for a data point in the EIC. $noise_{i} = (LocalMaximum + LocalAverage)/2 + K * Local Deviation$ $LocalAverage = \displaystyle \frac{\sum_{i-n}^{i+n}|S_{i}|}{2n + 1}$; $LocalDeviation = \displaystyle \frac{\sum_{i-n}^{i+n}|S_{i} - LocalAverage|}{2n + 1}$ In a first step, all local maxima and their adjacent minima in an EIC are detected, and peak boundaries are defined by the two minima surrounding a maximum. Peaks are selected if their maximum is above both the local noise level at its position in the EIC and above the global noise level. If two peaks are adjacent, and the local minimum that separates them is at least 1/3 the intensity of either peak maximum, these two peaks are merged. Additional filters include selection for peaks spanning at least a given number of scans, and a factor by which a peak maximum has to be above the average intensity inside the peak boundaries. Peaks are merged between files by first matching peaks with maxima within a specified retention time window. The peak boundary and maximum position are then calculated from the weighted average boundary and maximum positions of all peaks that are matched, weighted by the maximum intensity of each peak.

Regroup Table

Regroup Table{width=100%}

This tab allows you to redefine the columns containing intensity values and how they are grouped.

Navigating Plots{#NavPlots}

Many plots in Metaboseek are interactive and allow you to get more information by selecting the elements they display. Mass spectra, some EIC plots and the network module plot are interactive. To zoom in, drag your mouse while holding the left mouse button. A selection square will appear, and you can double click to zoom in. To zoom out, double click on the plot without selecting anything. NOTE: Double-clicks currently do not work on some computers, so you can alternatively click while holding the CTRL key to zoom in or out. In the Molecular Network view, hold the Z key while clicking to zoom out instead. To highlight a peak in a spectrum, select a time point in an EIC, a subnetwork or node in a network, hold the left SHIFT button and click on your datapoint of choice. Some plots allow export of the current view in .pdf or text format. In Spectra, the selected peak is highlighted, and when you mouse over other peaks, you can see the mass difference to the highlighted plots. You can also link the peak selection to the Molecular formula prediction Tab in the Options box to get a list of possible molecular formulas for it.

XCMS Analysis {#runXcms}

This section will help you to set up an xcms analysis in Metaboseek in order to identify LC/MS features that are differential between sets of data files. This can, for instance, be useful to assess the impact of a mutation on the metabolome of an organism or to identify compounds associated with the activity of an enzyme.

Running an xcms analysis - a description of the highlighted steps is below.{width=100%}

  1. Select a folder with MS data files. All files with supported file extensions in the selected folders and all its subfolders will be listed, so it makes sense to pre-sort your files in a reasonable folder structure:
  2. All files should be acquired under comparable conditions, especially with the same polarity. Differences in LC gradient or general composition (e.g. through widely different extraction methods, or comparing samples and blanks) can also make it difficult to apply retention time correction and find differential features.
  3. There are 7 tables with xcms settings you can change here. Navigate through them with the drop down menu highlighted as (2.). A short description for each parameter is given when you hover over the table entries. You can use the default settings and proceed to step 3 without changing any of them. The default is for highly similar LC/MS runs acquired at high resolution and high accuracy (< 5 ppm), and will find relatively small peaks (even if they only occur in a single replicate). While these settings allow for detection of small peaks, the processing time is relatively long and many false-positives (non-peaks) will also end up in the feature table.

Click here for details about the xcms settings

  • Peak Detection: set parameters for the xcms::findChromPeaks() function
  • Peak Filling: These settings specify how to look for intensities for molecular features in all files, even in files where no peak was detected for that feature in the initial Peak Detection step. The xcms peak filling parameters will be used if you select the "Fill peaks with xcms..." output option below. Technically, you are setting parameters for the
    xcms::fillChromPeaks() function. You can also set parameters for the Metaboseek peak intensity functions here, which will extract intensities for all molecular features in all files.
  • Feature Grouping: set the parameters for how xcms will group peaks from different files together (also known as correspondence analysis) so that intensities can be compared across files. These parameters are used for a call to xcms::groupChromPeaks with xcms::PeakDensityParam.
  • Output files: select which output files you want to get. The values in this table can more conveniently be set in the user interface below the tables ("Output selection" section).
  • CAMERA settings: Settings for isotope peak and adduct annotation with the CAMERA package. Metaboseek sequentially runs the CAMERA package functions xsAnnotate, groupFWHM, groupCorr, findIsotopes and findAdducts which are described in the CAMERA documentation.
  • RT correction: Settings for retention time correction, using xcms::adjustRtime either using the Obiwarp or the peakGroups method. If Obiwarp is selected and fails, the xcms runner script will attempt to run peakGroups with the given paramters.

  • Start the analysis with a click on the "Start analysis!" button.

  • Once the analysis is running, Metaboseek will generate a Project Folder for you, containing settings and results from your xcms run. You can load the Project Folder back into Metaboseek to keep all your analysis results in one place. See Project Folders for more information.
  • You can save settings as a .zip file (on windows computers, 7-zip or other software allowing for the zip command line prompt must be installed), or load a .zip file with settings from a previous run.
  • Note that loading settings will override your selection of MS data files. If you want to apply the settings to a new set of data files, load the settings first and then select a folder (step 1).
  • References



    mjhelf/METABOseek documentation built on April 27, 2022, 5:13 p.m.