Welcome to the info page on MetaboShiny! We are currently on BioRXiv and the paper itself is somewhat of a guide to how to use the software. Additionally, a visual manual to all the possible actions within the app can be viewed below.
For example input files (positive and negative peaklists + metadata) please see the examples
folder.
http://biorxiv.org/cgi/content/short/734236v1
Table of contents generated with markdown-toc
docker pull jcwolthuis/metaboshiny
mkdir -p ~/MetaboShiny/{databases,saves/admin}
docker run -p 8080:8080 -v ~/MetaboShiny/:/root/MetaboShiny/:cached --rm -it jcwolthuis/metaboshiny:latest Rscript -e "MetaboShiny::start_metshi(inBrowser=F)"
inst/install.metshi.R
file on this repository and run them in R (terminal or Rstudio)library(MetaboShiny); start_metshi(inBrowser=T)
;The drop-down list shows all projects that have been saved. 1. Select project name and press "Apply" to select project. 2. Click "load" at the bottom of the screen to load the saved progress on the project.
When searching databases for compounds matching your m/z values you can click on the compounds in the match list. If selected, MetaboShiny will copy the ticked compound information to the clipboard (SMILES, name, formula).
The "Definitions" tab shows the current adduct table. Below the table is a field to import another adduct table. The "Rules" field shows the adduct rules that are used to calculate adducts when matching m/z values with database compounds. Below the table is a field to import a new adduct rule table.
Here you can set the parameters and rules to use when predicting chemical formulas.
Select the method to use to score compounds that have the same weight (currently only M-score available). Set the intensity imprecision (default: 2%).
Change MetaboShiny colors, fonts, and font sizes. Restart MetaboShiny to apply changes.
MetaboShiny offers multiple metabolite databases for m/z identification. Before any other steps are taken, it is necessary to build the databases that the user is interested in. Each database only needs to be built once. To check if a database has been built, click the "check if database exists" button below the logo (Figure [Database Tab]). The database version number and download date are listed there as well.
MetaboShiny does not automatically update databases. To re-build a database of interest, click on the "build database" button below the logo in the Database Tab. The database version number and download date are listed below the logo.
Users often have their own in-house databases. You can add these in MetaboShiny as well. To do so, scroll down on the database overview page to the large '+' button. Here, you'll be shown examples of what files are required (a logo, a base csv file with columns displayed in the pop-up, and a database name + description). After you upload these, your database will be available for building after the next restart and you can click the "build database" button as you usually would.
MetaboShiny does not accept raw peak data. We suggest using either XCMS (with the MetaboAnalyst export option) or another method of choice such as MSnbase. You can find examples of three different accepted data formats (MetaboAnalyst-like, MetaboShiny native and Metabolights) in the inst/examples folder.
MetaboShiny, unless using the MetaboAnalyst format, requires an additional metadata table. This should minimally have a 'sample' column that contains the same sample identifiers used in the peak table files, an 'individual' column (since multiple samples can come from one individual in time series data) and at least one column on experimental group or something alike. Examples of metadata formats are also present in the inst/examples folder.
The data needs to be normalized in order to compare m/z peak values between samples and batches.
If your metadata only contains one batch and no column that represents concentration, then you can skip this part and continue to the Filtering and normalization step. Otherwise, follow the steps below. 1. Click on the "Get options" button. 2. If applicable, select the variable that represents concentration in your data. 3. If applicable, select the variable that contains your multiple batch IDs.
In this section, you will find multiple options and methods to filter and normalize your data. The best selection will depend on each user's data and we encourage you to look into the different methods that can be applied here. After normalization, the distribution of pre- and post-normalized peak values will be plotted for a randomly selected set of m/z values and samples, so the user can see how the data distribution has changed with the normalization and adjust their parameters if needed.
Select one of each of the options for the following normalization features and then press "Go". It is advised to save your data after completing this step (button on the bottom center of the screen). - Filtering options - Interquartile range - Mean - Median absolute deviation - Median - Non-parametric relative standard deviation (stdev) - Relative standard deviation (stdev) - Standard deviation - None - Normalization type - By reference compound - By reference feature - By sample-specific factor - Median - Quantile normalization - Sum - None - Data transformation - Cubic root transform - Log transform - None - Scaling - Autoscale/Z-transform - Mean-center - Pareto scaling - Range scaling - None - Missing values - Half feature minimum - Half sample minimum - Total minimum - Random forest. It is possible to adjust the number of trees built per variable and whether to parallelize based on forests or variables, or not. - KNN imputation - SVD imputation - BPCA imputation - PPCA imputation - Median - Mean - Leave them out - Leave them alone - Outliers The user can choose whether to exclude outliers from the data analysis by toggling the "Exclude outliers?" tab.
On the bottom center of the screen, you will find a "save" button. The data set will be saved under the name you chose in the file importing step.
This step is optional
In the pre-matching panel the user can match all m/z values with all or a subset of the available databases. This can be a time-consuming step if the dataset is large and many databases are selected, but will make searching for possible m/z metabolite matches much faster in the data analysis step. 1. Toggle the "Do matching beforehand?" button to "Yes". 2. Select the databases you wish to find matches in or click on the shopping basket to match with all databases. 3. Click on "Find matches". This can take a few minutes. 4. Save your data (button on the bottom center of the screen). If you wish to match with other or more databases, click the "Clear matches" button and re-do the steps above.
The analysis tab has two sections; the statistics panel and the side bar. The statistics panel contains tabs with different statistical analysis methods and the side bar contains multiple functions regarding variable choices, data subsetting, plot aesthetics, m/z matching, and plot export.
The side bar contains four tabs, whose descriptions and functions you can find below. * Note that the side bar can be resized by dragging the left-hand side.
See Statistics panel figure - Current experiment Shows the variable(s) and subset(s) that are currently being analysed. - Change of variable of interest Here you can choose to inspect one variable, two variables in combindation, time-series, or time-series in combination with one variable. Press "do stats on selected" to change the current experiment for analysis. - Subset data The "Current sample count" shows the number of samples that are analysed in the current experiment. To subset data, select the variable that you want to subset based on and then select the group(s) that you want to inspect. Click "click to subset" to apply changes. - Load existing meta-dataset Every time you use the subset/switch option, it saves the results from that subset. Use the drop-down menu under the subsetting field to go back to previously defined subsets.
This tab will display all database matches for a selected m/z value from the statistics panel. * The displayed table in the "mz > molecule" tab can be sorted based on m/z value, adduct, isotope percentage, or the m/z value distance from the database range (dppm). * The table can be copied or exported as a .csv or .xlsx file. * By clicking on the funnel icon, you can filter the matched results based on adduct, database, and main and minor isotope. * In the "molecule > mz" tab you can search for a specific metabolite name and the resulting table will list all m/z values from the data that match a corresponding metabolite in any of the databases. * In the match menu, when selecting a compound: - the name and SMILES or formula (specified in settings) are copied to the clipboard. - in the compound description field, clicking on a database icon will copy the database id to the clipboard.
Click the funnel above the "compound info" section to filter the database matches based on adduct, main/minor isotope, or database. * Adduct The bar chart shows the ratio of matches having a certain adduct. Hover over the slices for number of matches. Click on the slices that you are interested to filter for those adducts. * Isotope You can filter for main (100% peak) or minor (<100% peak) isotopes. Click pie slices to filter. * Databases The pie shows the ratio of results that come from each database. Hover over the slices for the number of matches. Click on the slices of the databases you want to filter for.
1. Settings
- For a PubMed search, enter your search term (e.g. metabolite name) and specify the publishing date range and how many abstracts to use in the search.
- To use the results from your m/z database matches, toggle "own word" to "from matches".
2. Press 'Plot' to start your search and render the word cloud
3. In the 'filters' tab, you can search for a second set of abstracts that you want to exclude from your previous search.
4. In the 'plot' tab you will find the word cloud. Toggle "cloud" to "barchart" to render a bar chart instead of a word cloud. Select the number of words to use for your plot.
5. Filter out words that are commonly used in the English language or metabolomics research by selecting predefined word filters. The words from your search in 3 will be listed as a separate set here.
* stopwords This set contains the top 200 most common words in the English language.
* metabolomics This set contains words that are common in the metabolomics field: metabolism, metabolic, metabolomic, metabolomics, biochemical, mass, spectrometry, nmr, direct, infusion, exposome, papers, compounds, and compound.
* default This set contains the words "exposome", "synonyms", and all the available database names.
6. Click on a word in the word cloud to show PubMed abstracts mentioning that word, and their PubMed IDs.
Plot aesthetic changes are applied on plot re-creation
In this tab you can upload new metadata. The file should be in a .csv format and contain a column with sample IDs and any new metadata as additional columns with new unique headers.
Uploading new metadata replaces the old metadata, so make sure to include all relevant columns in the new file.
The statistics panel has four tabs whose contents change based on whether the current experiment is a one-factor, two-factor, or a time series analysis. The four statistics categories are dimension reduction methods), per m/z value analyses, overview analyses, and machine learning.
If you click any m/z in the result table or in the plotly scatterplots, this will be recorded in the side bar. If the prematching step was performed then the matches will appear in the side bar. Otherwise you can manually search for matches.
In the venn diagram you can see which m/z values overlap between the different analyses. 1. Select the analyses you would like to compare by clicking on them and pressing the "down" arrow button. Select an analysis from the lower list and press the "up" arrow to remove the analysis from the comparison. 2. Select how many hits to include from each analysis. 3. Press "click to make venn diagram". The diagram will appear on the right. 4. Below the diagram, you can select any of the included analysis to view the m/z values that overlap between them.
MetaboShiny supports most of the models included in the caret package and adds some functionality based on training/testing selection and up/downsampling.
The machine learning results tab features two main plot + table sections to work with. First is the Receiver Operating Characteristic (ROC) curve often used to visualise the effectiveness of a predictive model. Here, a non-predictive model is shown as a diagonal straight line, and a 'perfect' model is a 90 degree curve hitting the top left of the graph. True positive rate and false positive rate are the axis descriptors here. Each of the models built has its own curve - in multivariate data this becomes more complex and curves are shown for each pair of possible class comparisons (e.g. with A B C one would see curves for A vs B, A vs C, A vs B, B vs C). When clicking a curve, the table below displays the features (m/z + metadata) used in this model and their importance within the model. Higher importance means that the feature is important to do a correct prediction. Models can also be selected in the left-hand table to explore the involved features without using the plot.
The second plot available is a bar plot displaying a top amount of features based on feature importance, with more important features (averaged over all repeats) displayed first in the bar plot. Users can click these to register the m/z to use in further searching.
Please report any issues and feedback on the Issues page here, along with suggestions! =)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.