knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette descrip a classified visualization method, called MCnebula, for the analysis of untargeted LC-MS/MS datasets. MCnebula utilizes the state-of-the-art computer prediction technology, SIRIUS workflow (SIRIUS, ZODIAC, CSI:fingerID, CANOPUS), for compound formula prediction, structure retrieve and classification prediction. MCnebula integrates an abundance-based class selection algorithm into compound annotation. The benefits of molecular networking, i.e. intuitive visualization and a large amount of integratable information, were incorporated into MCnebula visualization. With MCnebula, we can switch from untargeted to targeted analysis, focusing precisely on the compound or chemical class of interest to the researcher.
library(MCnebula) library(dplyr) library(ggplot2) library(ggraph) library(grid)
The prerequisite soft for MCnebula is SIRIUS 4. Users can download that from https://bio.informatik.uni-jena.de/software/sirius/. The following codes were tested on SIRIUS version 4.9.12 (https://github.com/boecker-lab/sirius/releases/tag/v4.9.12).
NEWS! SIRIUS 5 has released since 2022. Note that there may be some difference in CLI command of different version.
In addition, users are encouraged to perform MZmine 2 (https://github.com/mzmine/mzmine2) for LC-MS/MS data processing.
Before mass spectrometry data been processed, the raw data should be converted to .mzML or .mzXML file in most cases. ProteoWizard (https://proteowizard.sourceforge.io/) could be implemented for conversion.
For MZmine2 processing, an XML batch file outlined the example parameters for waters Qtof could be find in https://github.com/Cao-lab-zcmu/research-supplementary.
Here we prepared some example files for this vignette to better illustrate MCnebula workflow.
eg.path <- system.file("extdata", "raw_instance.tar.gz", package = "MCnebula") tmp <- tempdir() utils::untar(eg.path, exdir = tmp) mgf.path <- paste0(tmp, "/", "instance5.mgf") ### show details of .mgf data.table::fread(mgf.path, header = F, sep = NULL)
If SIRIUS soft has been download locally and the Environment Path has been set down (for example, set PATH=$PATH:/you/your_dir/sirius-gui/bin
in ~/.bashrc
), the following is availlable:
```{sh, eval = F} system( paste0( "sirius -i ", mgf.path, " -o test --maxmz 800 formula -c 50 zodiac structure canopus" ) )
If you run with SIRIUS 5, the command is quite different. The following is the example (not tested).
system( paste0( "sirius -i ", mgf.path, " -o test --maxmz 800 formula -c 50 zodiac ", "fingerprint structure compound-classes ", "write-summaries --output test" ) )
Unfortunately, the current version of MCnebula may not be compatible with the output of SIRIUS 5 ([new features and changes](https://www.youtube.com/watch?v=Bj0hIrwx9ks&t=5s&ab_channel=BoeckerLab)). Instead, SIRIUS 4 is recommended.    A more simpler approach is to use SIRIUS GUI version. ### MCnebula processing In deed, the computation of SIRIUS is time-consumed. It may cost several hours even days. Here, we prepared a fairly small dataset (SIRIUS project space) which has been done from SIRIUS computation. The following show the detail of this dataset. ```r list.files(tmp) %>% .[which(. != "instance5.mgf")]
To begin with, MCnebula should he initialized in SIRIUS project.
MCnebula::initialize_mcnebula(tmp, rm_mc.set = T)
This will set some global var.
ls(pattern = "^\\.MCn\\.", all.names = T)
Users can manually modify the setup.
MCnebula::collate_structure( exclude_element = c("Cl", "S", "P"), ppm_error = 20 )
The results:
.MCn.formula_set .MCn.structure_set
Next, we collate the metadata of chemical classes hierarchy.
MCnebula::build_classes_tree_list()
The frame of returned project, e.g.:
.MCn.class_tree_list[[4]]
This function retrieve all posterior probability of classification prediction (PPCP) of compounds and gathered them as .MCn.ppcp_dataset
.
Furthermore, in default, this function also summarise .MCn.nebula_class
and .MCn.nebula_index
upon the data of .MCn.ppcp_dataset
.
If done, .MCn.nebula_index
would be returned.
MCnebula::collate_ppcp( ## due to the size of example dataset, we set much smaller herein ## min_possess, 20 or more may be better min_possess = 1, ## 0.1 may better max_possess_pct = 0.9 )
In addition, collate_ppcp
would set following global var:
head(.MCn.ppcp_dataset, n = 1) head(.MCn.nebula_class, n = 1) %>% lapply(dplyr::as_tibble)
Indeed, the above prepared instance data is too small to show the detail of the following function of MCnebula. Hence, we used another projects which has been collated done via above functions.
### the following project were auto-loaded while use 'library(MCnebula)' inst_formula_set inst_structure_set head(inst_ppcp_dataset, n = 1)
We set these data as global var, so as the MCnebula function could recognized them without additional parameters.
.MCn.formula_set <- inst_formula_set .MCn.structure_set <- inst_structure_set .MCn.ppcp_dataset <- inst_ppcp_dataset
Re-execute the command:
MCnebula::collate_ppcp( min_possess = 20, max_possess_pct = 0.1 )
Of note, this function will conduct spectral similarity (fragmentation spectra) computation and reformated these similarity (cosine) as edges file.
This is performed via MSnbase::compareSpectra
.
It usually cost some time to get results.
Here, we used a randomly formed edges file to avoid it.
edges <- .MCn.formula_set$.id %>% combn(2) %>% t() set.seed(100) ### randomly get combination. edges <- edges[sample(1:nrow(edges), 1000), ] %>% data.frame() %>% dplyr::as_tibble() %>% dplyr::rename( .id_1 = 1, .id_2 = 2 ) %>% dplyr::mutate( dotproduct = rnorm(1000, mean = 0.5, sd = 0.1), mass_diff = rnorm(1000, mean = 200, sd = 50), ) tmp.edges <- tempfile() ### write done MCnebula::write_tsv(edges, filename = tmp.edges) ### show details edges
Subsequently, we use this edges file to generate parent-nebula graph.
MCnebula::generate_parent_nebula( rm_parent_isolate_nodes = T, ## specify the edges file edges_file = tmp.edges )
This function will generate nodes data (chemical formula and strucutre annotation) and edges data (if the edges file were not specified but real-time computed). For most, parent-nebula graph was generated.
.MCn.parent_nodes .MCn.parent_graph
This function generate multiple network graph upon parent-nebula (nodes and edges) according to compound classification.
MCnebula::generate_child_nebulae( max_edges = 5, ## this will write a output of .graphml, which support by Cytoscape. output_format = "graphml" )
These graph were stored in list.
head(.MCn.child_graph_list, n = 2)
All visualization in MCnebula package are output with .svg image.
MCnebula::visualize_parent_nebula( layout = "kk", ## map nodes color with superclass nodes_color = c("hierarchy" = 3), width = 25, height = 20 )
rsvg::rsvg_png
could be applied to convert .svg to other png file.
from.svg <- paste0( .MCn.output, "/", .MCn.results, "/parent_nebula/parent_nebula.svg" ) to.png <- paste0( .MCn.output, "/", .MCn.results, "/parent_nebula/parent_nebula.png" ) rsvg::rsvg_png(from.svg, to.png)
This function draw child-nebula in grid panel. The overview of child-nebula shows the abundant classes of this LC-MS/MS dataset.
MCnebula::visualize_child_nebulae()
All output files:
list.files(paste0(.MCn.output, "/", .MCn.results), recursive = T)
Other information:
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.