knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) knitr::opts_knit$set(root.dir = system.file(package = "tima"))
library("tima")
This vignette describes the main steps of the annotation process.
For the moment, we support 3 different types of annotations:
These annotations are of the lowest possible quality. However, they allow to annotate unusual adducts, in-source fragments thanks to different small tricks implemented. Try to really restrict the adduct list and structure-organism pairs you want to consider as possibilities explode rapidly.
targets::tar_make(names = tidyselect::matches("^ann_ms1_pre"))
targets::tar_make(names = tidyselect::matches("^ann_spe_pre"))
We use the spectral entropy
from https://doi.org/10.1038/s41592-021-01331-z for matching.
In case, a python implementation of the spectral matching steps is also available at: https://github.com/mandelbrot-project/spectral_lib_matcher. The python version also includes other similarity measures.
targets::tar_make(names = c(tidyselect::matches("^ann_spe_is")))
targets::tar_make(names = tidyselect::matches("^ann_spe_exp_gnp_pre"))
As SIRIUS jobs are long to perform, we provide example SIRIUS workspaces (both SIRIUS 5 and 6). Note that spectral matches from SIRIUS are not supported for now. They have been generated on the 20 first lines of the example MGF with the following command:
```{bash eval=FALSE, include=TRUE}
sirius \ --noCite \ --input=data/source/example_spectra_mini.mgf \ --output=data/interim/annotations/example_sirius.sirius/ \ --maxmz=800 \ config \ --AlgorithmProfile=orbitrap \ --StructureSearchDB=BIO \ --Timeout.secondsPerTree=10 \ --Timeout.secondsPerInstance=10 \ formulas \ zodiac \ fingerprints \ classes \ structures \ denovo-structures \ summaries \ --full-summary
sirius \ --noCite \ --input data/source/example_spectra_mini.mgf \ --output data/interim/annotations/example_sirius/ \ --maxmz 800 \ config \ --AlgorithmProfile orbitrap \ --StructureSearchDB BIO \ --Timeout.secondsPerTree 10 \ --Timeout.secondsPerInstance 10 \ formula \ zodiac \ fingerprint \ structure \ compound-classes \ write-summaries \ --full-summary
These parameters were not optimized and were only used to give an example output. If you are using the cli, do not forget to generate the summaries with the `--full-summary` option, or if you use the gui, generate them by clicking the corresponding icon. You can get an example running: ```r tima:::get_example_sirius()
The sirius workspace should ideally have yourPattern_sirius
as name and be placed in data/interim/annotations
(else it will not be found by default except you provide the right path).
targets::tar_make(names = tidyselect::matches("^ann_sir_pre"))
If you want to know how we attempt to combine the CSI score with other ones, see R/transform_score_sirius_csi.R Note that starting from SIRIUS6, the approx confidence score is the one considered, and not the exact one.
targets::tar_visnetwork( names = starts_with("ann_s"), exclude = c( tidyselect::contains("benchmark"), tidyselect::contains("par_"), tidyselect::contains("paths") ), targets_only = TRUE, degree_from = 8 )
Annotations are now prepared and can be used for further processing. Your features are not only informed with structural information but also, chemical class information. The latter might be corresponding or not to the chemical class of your annotated structure, depending on the consistency of your annotations.
Within our workflow, we offer a new way to attribute chemical classes to your features. It is analog to Network Annotation Propagation, but uses the edges of your network instead of the clusters. This makes more sense in our view, as also recently illustrated by CANOPUS.
We are currently also working on CANOPUS integration for chemical class annotation but this implies way heavier computations and we want to offer our users a fast solution.
A network is generated during the process.
The edges are created based on the spectral entropy similarity
calculated between your spectra (see https://doi.org/10.1038/s41592-021-01331-z).
targets::tar_make(names = tidyselect::matches("fea_edg_spe"))
If needed, you can get an example of what your minimal feature table should look like by running:
tima::get_example_files(example = "features")
targets::tar_make(names = tidyselect::matches("fea_pre"))
targets::tar_make(names = tidyselect::matches("fea_edg_pre"))
targets::tar_make(names = tidyselect::matches("fea_com"))
targets::tar_make(names = tidyselect::matches("fea_com_pre"))
This step allows you to attribute biological source information to your features. If all your features come from a single extract, it will attribute the biological source of your extract to all your features. If you have multiple extracts aligned, it will take the n (according to your parameters) highest intensities of your aligned feature table and attribute the biological source of corresponding extracts.
targets::tar_make(names = tidyselect::matches("tax_pre"))
This step allows you to filter out the annotation of all the tools used, based on your own internal (experimental or predicted) retention times library. It is optional. If you do not have one, it will simply group the annotations of all tools.
targets::tar_make(names = tidyselect::matches("^ann_fil"))
You are almost there! See already all the steps accomplished!
targets::tar_visnetwork( names = tidyselect::matches("^ann"), exclude = c( tidyselect::contains("benchmark"), tidyselect::contains("par_"), tidyselect::contains("paths") ), targets_only = TRUE, degree_from = 8 )
We now recommend you to read the next vignette.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.