BiocStyle::markdown()
library(BiocStyle)

Introduction

XXX (Description of MTBLS1586)

Setup

Load required libries

MetClassNetR integrates functions from different libraries. Install current version of MetClassNetR and additional MetNet functions from developmental branches. After Installation re-load R and load required libraries.

# installation of additional packages
#devtools::install_github("tnaake/MetNet") 
# note: Re-load R studio after installation
#devtools::install_github("MetClassNet/MetClassNetR", ref = "devel_liesa") 
#devtools::install_github("blosloos/nontarget") 
#devtools::install_github("blosloos/nontargetData") 


# load required libraries
library(MetClassNetR)
library(Spectra)
library(MsCoreUtils)
library(MsBackendMgf)
library(easyGgplot2)
library(QFeatures)
library(tidyverse)
library(nontarget)
library(visNetwork) # for visNetwork() and friends
library(networkD3)  # for saveNetwork()

Parallelization

Parallelization can be done to speed up calculations.

register(bpstart(SnowParam(4, progressbar = TRUE)))

Reading of MS1 data

MS1 data is read from Metabolights MAF or mzTab-M files into a QFeatures object. Information about the mass-to-charge (mz) ratio and retention times (RT) were extracted from the MAF/ mzTab-M file and stored into the QFeatures-object. The data from positive and negative ionization mode are stored in two different QFeatures-objects.

# Metabolights MAF files
pos_maf <- system.file("extdata/MTBLS1586/m_MTBLS1586_LC-MS_positive_reverse-phase_metabolite_profiling_v2_maf.tsv",
                       package = "MetClassNetR")
neg_maf <- system.file("extdata/MTBLS1586/m_MTBLS1586_LC-MS_positive_reverse-phase_metabolite_profiling_v2_maf.tsv",
                       package = "MetClassNetR")

# create QFeature objects
mtbls1586_qf_pos <- readMaf(file(pos_maf), ecol = 23, name = "mtbls1586_pos")
mtbls1586_qf_neg <- readMaf(file(neg_maf), ecol = 23, name = "mtbls1586_neg")

Reading of MS2 data

MS2 data is read from a .mgf or .msp file into a Spectra object.

# get MS2 data in .mgf files
pos_mgf <- system.file("extdata/MTBLS1586/ms2_MTBLS1586_LC-MS_positive_reverse-phase_metabolite_profiling.mgf",
                       package = "MetClassNetR")
neg_mgf <- system.file("extdata/MTBLS1586/ms2_MTBLS1586_LC-MS_negative_reverse-phase_metabolite_profiling.mgf",
                       package = "MetClassNetR")

# load MS2 data
ms2_spectra_pos <- Spectra::Spectra(pos_mgf,
                           source = MsBackendMgf(),
                           backend = MsBackendDataFrame())

ms2_spectra_neg <- Spectra::Spectra(neg_mgf,
                           source = MsBackendMgf(),
                           backend = MsBackendDataFrame())

Sanity checks

check data (namespace of ids etc...)

# check minimal requirements
checkQFeatures(mtbls1586_qf_neg)
checkQFeatures(mtbls1586_qf_pos)
checkSpectra(ms2_spectra_neg)
checkSpectra(ms2_spectra_pos)

checkIDNamespace sanity check will fail if Ids between QFeatures and Spectra are not matching.Therefore, MS2 spectra can be filled with empty spectra to have a complete list of spectra for all the features.

# # fill spectra
# ms2_spectra_neg <- fillSpectra(mtbls1586_qf_neg, ms2_spectra_neg)
# ms2_spectra_pos <- fillSpectra(mtbls1586_qf_pos, ms2_spectra_pos)
# 
# # check namespace 
# checkIdNamespace(mtbls1586_qf_neg, ms2_spectra_neg)
# checkIdNamespace(mtbls1586_qf_pos, ms2_spectra_pos)

Reading of transformation file

The mass-difference network is calculated using the MetNet package. Here, all calculated mass-differences between all pairs of features are compared to a transformation-list, containing different biochemical reactions and their corresponding mass difference. Required columns are "group" and "mass". Additional columns as "formula" can be added. The transformation-list needs to be provided by the user. An example can be found in the system files.

transformations_file <- system.file("extdata/MTBLS1586/transformations_MTBLS1586.csv", 
                                    package = "MetClassNetR")

transformations <- read_csv(transformations_file, col_names = TRUE)

transformations[1:5,]

Quality check of input

ZZZ

Setup for networks

The master-list will contain all types of experimental networks that have been created in this workflow.

master <- list()

Analysis

Mass difference Network

The mass-difference network is calculated using the MetNet package. All mass-differences between all pairs of features are calculated and are compared to the transformation-list. This transformation-list contains different biochemical reactions and their corresponding mass difference. Only if the mass-difference between two features matches to a mass-difference in the transformation file, within a certain ppm-range, it will be displayed. Different to the original MetNet-functions, not only the name of the mass-difference but also the value itself will be displayed as output adjacency matrix. The adjacency matrix will be than transformed to a data.frame.

# Create mass difference network
massdiff <- qfeat_structural(x = mtbls1586_qf_pos,
                             assay_name = "mtbls1586_pos",
                             transformation = transformations,
#                             var = c("Name", "Formula difference", "mass"),
                             var = c("name", "formula", "mass"),
                             ppm = 10)

# Create data.frame of adjacency matrix
massdiff_df <- as.data.frame(massdiff) |> 
  filter(binary == 1)


massdiff_df[5:10,]

A summary of the current data-set on the present mass-difference distribution is provided using the summary_mz-function. This function stores the summary as data.frame, which is also plotted. Depending on the size of the data, it might be useful to filter the number of determined mass-differences, e.g. by 1000 counts.

# Summary & filter for counts e.g. higher 1000
#sum_mz_f <- mz_summary(massdiff, filter = 1000, var = "group")

# some visualizations
#mz_vis(sum_mz_f, var = "group")

# append mass difference network
master <- c(master, massdiff)

Add annotatation

A QFeatures-object stores various information on LC-MS/MS data. Also, annotations and possible database identifiers are present in a QFeatures-object. In order to add the annotations to the adjacency-list, qfeat_annotation is applied. It extracts the annotations, if any present, and adds them to the desired adjacency-list, for example the list from the mass-difference network. In order to check if the annotations match the mass-difference, a filter is applied so that only features remains that have annotations in both features.

# Add annotations to mass-difference adjacency matrix
rowData(massdiff) <- as.data.frame(rowData(mtbls1586_qf_pos[["mtbls1586_pos"]]))

# display annotation for the feature "Cluster_3201"
rowData(massdiff)["Cluster_3201", ]

massdiff_anno <- as.data.frame(rowData(massdiff))

# Filter for present annotations
anno_filter <- filter(massdiff_anno, 
                      massdiff_anno$metabolite_identification != "")
anno_filter[25:30,]

Correlation network

Correlation networks will be calculated using functionality from the MetNet package. qfeat_statistical uses the QFeatures-Object as input and extracts the feature intensity in order to calculate correlations from the statistical-function from MetNet. MetNet implemented various models that can be applied to calculate correlations, e.g. pearson, spearman, partial-pearson, and many more which might also be selected. After generating the correlation adjacency-matrix, it will also be transformed to a data.frame

# Correlation network using partial pearson and GGM correlation
corr <- qfeat_statistical(x = mtbls1586_qf_pos, 
                                assay_name = "mtbls1586_pos", 
                                model = c("pearson_partial", "ggm"),
                                na.omit = TRUE,
                                p.adjust = "BH")

# Create data.frame from correlation adjacency matrix 
corr_df <- as.data.frame(corr)
corr_df[25:30,]

master <- c(master, corr)

## plots to view the distribution of correlations 
corr_fl <- data.frame(coef = c(corr_df$pearson_partial_coef,
                              corr_df$ggm_coef),
                     group = c(rep("pearson_partial", nrow(corr_df)),
                               rep("ggm", nrow(corr_df)))
                     )

plot <- ggplot2.histogram(data=corr_fl, xName='coef',
                  groupName='group', legendPosition="right",
                  alpha=0.5, binwidth=0.01,
                  brewerPalette="Paired",
                  addMeanLine=TRUE,  meanLineSize=1)

plot_custom <-ggplot2.customize(plot, xtitle="Correlation coefficient", 
                                ytitle="Count",
                        showLegend=TRUE, axisLine=c(0.5, "solid", "black"),
                        addDensity=TRUE, removePanelBorder=TRUE, 
                        backgroundColor="white", 
                        mainTitle="Correlation coefficient")   

plot_custom

Homologue Series

ABC

homol <- qfeat_homol(x = mtbls1586_qf_pos,
                             assay_name = "mtbls1586_pos",
                      elements=c("C","H","O"), use_C=TRUE,
                      minmz=5,  maxmz=120,
                      minrt=-30,  maxrt=30,
                      ppm=TRUE,
                      mztol= 5,  rttol=30,
                      minlength=5,
                      mzfilter=FALSE,
                      #spar=.45,    R2=.98,
                      plotit=FALSE)

MS2 similarity network

Spectral similarity network is created using positive MS2 spectra.

tolerance = 0.005
ppm = 0

# filter ms2 spectra 20 % of intensity
filter <- TRUE
my_filter <- function(x) {
  x > max(x, na.rm = TRUE) / 10
}


ms2_spectra_pos <- filterIntensity(ms2_spectra_pos, my_filter)

## to speed up calculations: only use 50 spectra for spectral similarity
spect <- spec_molNetwork(ms2_spectra_pos[1:50],
                             MAPFUN = joinPeaksGnps,
                             methods = "gnps",
                             tolerance = tolerance,
                             ppm = ppm, 
                             type = "inner")


spect_adj <- addSpectralSimilarity(am_structural = massdiff, 
                                   ms2_similarity = spect)

# Create data.frame of adjacency matrix
spect_df <- as.data.frame(spect_adj) |> 
  filter(!is.na(gnps))

master <- c(master, spect_adj)

Export networks to gml format

Created networks are exported and saved as .gml file in the current working-directory. Feature Ids will be stored as additional name in node attributes. All the other information from the adjacency-lists will be stores as edge-attributes. It might also be defined which columns of the adjacency-list should be stored, by using the select option in the exportNet2gml-function.

## Mass-difference network to gml
# note: only first 10 features are exported
exportNet2gml(x = massdiff_df[1:10,], file = "mzdiff")

## Correlation network to gml
# note: only first 10 features are exported
exportNet2gml(x = pears_df[1:10,], file = "pears")

## Spectral similarity network2gml
# note: only first 10 features are exported
exportNet2gml(x = spect_df[1:10,], file = "similarity", 
              select = c("gnps"))

Additionally, attributes may be exported as .gml attribute-file using exportAttributes2gml. Edge attributes are prepared by merging the adjacency-lists from mass-difference and correlation-networks. Node attributes are generated by extraction additional information as annotation, mz, and RT values from the QFeatures-object.

# Create and export attribute file 
attributes <- merge(corr_df, massdiff_df, by = c("Row", "Col"))


# Prepare annotation file
feat_anno <- as.data.frame(rowData(mtbls1586_qf_pos[["mtbls1586_pos"]]))
feat_anno <- feat_anno[,c("metabolite_identification", "mz", "rtime")]
colnames(feat_anno) <- c("Annotation", "mz", "RT")


# note: only first 10 features are exported
exportAttributes2gml(attributes[1:10,], file = "attributes_mtbls1586_pos", 
                     anno = feat_anno)

Export of homologous series

# 
# el <- as_tibble(homol[["Peaks in homologue series"]]) |>
#     filter(`to ID` != "0") |> select (c("peak ID", "HS cluster",
#                                          "m/z increment", "RT increment"))
# 
# g <- el |> select(c("peak ID", "to ID")) |> as.matrix() |> graph_from_edgelist() |> as_data_frame()
# 
# 
# ## Write to File
# write_graph(g, "/tmp/g.gml", "gml")


MetClassNet/MetClassNetR documentation built on June 30, 2023, 2:12 p.m.