knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

This vignette descrip a classified visualization method, called MCnebula, for the analysis of untargeted LC-MS/MS datasets. MCnebula utilizes the state-of-the-art computer prediction technology, SIRIUS workflow (SIRIUS, ZODIAC, CSI:fingerID, CANOPUS), for compound formula prediction, structure retrieve and classification prediction. MCnebula integrates an abundance-based class selection algorithm into compound annotation. The benefits of molecular networking, i.e. intuitive visualization and a large amount of integratable information, were incorporated into MCnebula visualization. With MCnebula, we can switch from untargeted to targeted analysis, focusing precisely on the compound or chemical class of interest to the researcher.

R and other softs Setup

R setup

library(MCnebula)
library(dplyr)
library(ggplot2)
library(ggraph)
library(grid)

Others setup

The prerequisite soft for MCnebula is SIRIUS 4. Users can download that from https://bio.informatik.uni-jena.de/software/sirius/. The following codes were tested on SIRIUS version 4.9.12 (https://github.com/boecker-lab/sirius/releases/tag/v4.9.12).

   NEWS! SIRIUS 5 has released since 2022. Note that there may be some difference in CLI command of different version.

   In addition, users are encouraged to perform MZmine 2 (https://github.com/mzmine/mzmine2) for LC-MS/MS data processing.

   Before mass spectrometry data been processed, the raw data should be converted to .mzML or .mzXML file in most cases. ProteoWizard (https://proteowizard.sourceforge.io/) could be implemented for conversion.

Data preprocessing

Raw data processing

For MZmine2 processing, an XML batch file outlined the example parameters for waters Qtof could be find in https://github.com/Cao-lab-zcmu/research-supplementary.

SIRIUS computation workflow

Here we prepared some example files for this vignette to better illustrate MCnebula workflow.

eg.path <- system.file("extdata", "raw_instance.tar.gz", package = "MCnebula")
tmp <- tempdir()
utils::untar(eg.path, exdir = tmp)
mgf.path <- paste0(tmp, "/", "instance5.mgf")
### show details of .mgf
data.table::fread(mgf.path, header = F, sep = NULL)

If SIRIUS soft has been download locally and the Environment Path has been set down (for example, set PATH=$PATH:/you/your_dir/sirius-gui/bin in ~/.bashrc), the following is availlable:

```{sh, eval = F} system( paste0( "sirius -i ", mgf.path, " -o test --maxmz 800 formula -c 50 zodiac structure canopus" ) )

If you run with SIRIUS 5, the command is quite different.
The following is the example (not tested).

system( paste0( "sirius -i ", mgf.path, " -o test --maxmz 800 formula -c 50 zodiac ", "fingerprint structure compound-classes ", "write-summaries --output test" ) )

Unfortunately, the current version of MCnebula may not be compatible with the output of SIRIUS 5 ([new features and changes](https://www.youtube.com/watch?v=Bj0hIrwx9ks&t=5s&ab_channel=BoeckerLab)).
Instead, SIRIUS 4 is recommended.

&ensp;&ensp; 
A more simpler approach is to use SIRIUS GUI version.

### MCnebula processing

In deed, the computation of SIRIUS is time-consumed. It may cost several hours even days.
Here, we prepared a fairly small dataset (SIRIUS project space) which has been done from SIRIUS computation.
The following show the detail of this dataset.

```r
list.files(tmp) %>% 
  .[which(. != "instance5.mgf")]

Data collating

To begin with, MCnebula should he initialized in SIRIUS project.

MCnebula::initialize_mcnebula(tmp, rm_mc.set = T)

This will set some global var.

ls(pattern = "^\\.MCn\\.", all.names = T)

Users can manually modify the setup.

Collate structure
MCnebula::collate_structure(
  exclude_element = c("Cl", "S", "P"),
  ppm_error = 20
)

The results:

.MCn.formula_set
.MCn.structure_set

Next, we collate the metadata of chemical classes hierarchy.

MCnebula::build_classes_tree_list()

The frame of returned project, e.g.:

.MCn.class_tree_list[[4]]
Collate PPCP data.

This function retrieve all posterior probability of classification prediction (PPCP) of compounds and gathered them as .MCn.ppcp_dataset. Furthermore, in default, this function also summarise .MCn.nebula_class and .MCn.nebula_index upon the data of .MCn.ppcp_dataset. If done, .MCn.nebula_index would be returned.

MCnebula::collate_ppcp(
  ## due to the size of example dataset, we set much smaller herein
  ## min_possess, 20 or more may be better
  min_possess = 1,
  ## 0.1 may better
  max_possess_pct = 0.9
)

In addition, collate_ppcp would set following global var:

head(.MCn.ppcp_dataset, n = 1)
head(.MCn.nebula_class, n = 1) %>% 
  lapply(dplyr::as_tibble)

   Indeed, the above prepared instance data is too small to show the detail of the following function of MCnebula. Hence, we used another projects which has been collated done via above functions.

### the following project were auto-loaded while use 'library(MCnebula)'
inst_formula_set
inst_structure_set
head(inst_ppcp_dataset, n = 1)

We set these data as global var, so as the MCnebula function could recognized them without additional parameters.

.MCn.formula_set <- inst_formula_set
.MCn.structure_set <- inst_structure_set
.MCn.ppcp_dataset <- inst_ppcp_dataset

Re-execute the command:

MCnebula::collate_ppcp(
  min_possess = 20,
  max_possess_pct = 0.1
)

Generate network graph

Generate parent-nebula graph

Of note, this function will conduct spectral similarity (fragmentation spectra) computation and reformated these similarity (cosine) as edges file. This is performed via MSnbase::compareSpectra. It usually cost some time to get results. Here, we used a randomly formed edges file to avoid it.

edges <- .MCn.formula_set$.id %>% 
  combn(2) %>% 
  t()
set.seed(100)
### randomly get combination.
edges <- edges[sample(1:nrow(edges), 1000), ] %>% 
  data.frame() %>% 
  dplyr::as_tibble() %>%
  dplyr::rename(
    .id_1 = 1,
    .id_2 = 2
  ) %>% 
  dplyr::mutate(
    dotproduct = rnorm(1000, mean = 0.5, sd = 0.1), 
    mass_diff = rnorm(1000, mean = 200, sd = 50),
  )
tmp.edges <- tempfile()
### write done
MCnebula::write_tsv(edges, filename = tmp.edges)
### show details
edges

Subsequently, we use this edges file to generate parent-nebula graph.

MCnebula::generate_parent_nebula(
  rm_parent_isolate_nodes = T,
  ## specify the edges file
  edges_file = tmp.edges
)

This function will generate nodes data (chemical formula and strucutre annotation) and edges data (if the edges file were not specified but real-time computed). For most, parent-nebula graph was generated.

.MCn.parent_nodes
.MCn.parent_graph
Generate child-nebulae graph

This function generate multiple network graph upon parent-nebula (nodes and edges) according to compound classification.

MCnebula::generate_child_nebulae(
  max_edges = 5,
  ## this will write a output of .graphml, which support by Cytoscape.
  output_format = "graphml"
)

These graph were stored in list.

head(.MCn.child_graph_list, n = 2)

Visualization of chemical-nebula

Visualization of parent-nebula

All visualization in MCnebula package are output with .svg image.

MCnebula::visualize_parent_nebula(
  layout = "kk",
  ## map nodes color with superclass
  nodes_color = c("hierarchy" = 3),
  width = 25,
  height = 20
)

rsvg::rsvg_png could be applied to convert .svg to other png file.

from.svg <- paste0(
  .MCn.output, "/", .MCn.results,
  "/parent_nebula/parent_nebula.svg"
)
to.png <- paste0(
  .MCn.output, "/", .MCn.results,
  "/parent_nebula/parent_nebula.png"
)
rsvg::rsvg_png(from.svg, to.png)
Visualization of child-nebula

This function draw child-nebula in grid panel. The overview of child-nebula shows the abundant classes of this LC-MS/MS dataset.

MCnebula::visualize_child_nebulae()

Information

All output files:

list.files(paste0(.MCn.output, "/", .MCn.results), recursive = T)

Other information:

sessionInfo()


chi-med-pro/MCnebula documentation built on March 28, 2023, 5:55 a.m.