In MetClassNet/MetClassNetR: Create and analize networks built from metabolomics Data

BiocStyle::markdown()

library(BiocStyle)

Introduction

XXX (Description of MTBLS1586)

Setup

Load required libries

MetClassNetR integrates functions from different libraries. Install current version of MetClassNetR and additional MetNet functions from developmental branches. After Installation re-load R and load required libraries.

# installation of additional packages
#devtools::install_github("tnaake/MetNet") 
# note: Re-load R studio after installation
#devtools::install_github("MetClassNet/MetClassNetR", ref = "devel_liesa") 
#devtools::install_github("blosloos/nontarget") 
#devtools::install_github("blosloos/nontargetData") 


# load required libraries
library(MetClassNetR)
library(Spectra)
library(MsCoreUtils)
library(MsBackendMgf)
library(easyGgplot2)
library(QFeatures)
library(tidyverse)
library(nontarget)
library(visNetwork) # for visNetwork() and friends
library(networkD3)  # for saveNetwork()

Parallelization

Parallelization can be done to speed up calculations.

register(bpstart(SnowParam(4, progressbar = TRUE)))

Reading of MS1 data

MS1 data is read from Metabolights MAF or mzTab-M files into a QFeatures object. Information about the mass-to-charge (mz) ratio and retention times (RT) were extracted from the MAF/ mzTab-M file and stored into the QFeatures-object. The data from positive and negative ionization mode are stored in two different QFeatures-objects.

# Metabolights MAF files
pos_maf <- system.file("extdata/MTBLS1586/m_MTBLS1586_LC-MS_positive_reverse-phase_metabolite_profiling_v2_maf.tsv",
                       package = "MetClassNetR")
neg_maf <- system.file("extdata/MTBLS1586/m_MTBLS1586_LC-MS_positive_reverse-phase_metabolite_profiling_v2_maf.tsv",
                       package = "MetClassNetR")

# create QFeature objects
mtbls1586_qf_pos <- readMaf(file(pos_maf), ecol = 23, name = "mtbls1586_pos")
mtbls1586_qf_neg <- readMaf(file(neg_maf), ecol = 23, name = "mtbls1586_neg")

Reading of MS2 data

MS2 data is read from a .mgf or .msp file into a Spectra object.

# get MS2 data in .mgf files
pos_mgf <- system.file("extdata/MTBLS1586/ms2_MTBLS1586_LC-MS_positive_reverse-phase_metabolite_profiling.mgf",
                       package = "MetClassNetR")
neg_mgf <- system.file("extdata/MTBLS1586/ms2_MTBLS1586_LC-MS_negative_reverse-phase_metabolite_profiling.mgf",
                       package = "MetClassNetR")

# load MS2 data
ms2_spectra_pos <- Spectra::Spectra(pos_mgf,
                           source = MsBackendMgf(),
                           backend = MsBackendDataFrame())

ms2_spectra_neg <- Spectra::Spectra(neg_mgf,
                           source = MsBackendMgf(),
                           backend = MsBackendDataFrame())

Sanity checks

check data (namespace of ids etc...)

# check minimal requirements
checkQFeatures(mtbls1586_qf_neg)
checkQFeatures(mtbls1586_qf_pos)
checkSpectra(ms2_spectra_neg)
checkSpectra(ms2_spectra_pos)

checkIDNamespace sanity check will fail if Ids between QFeatures and Spectra are not matching.Therefore, MS2 spectra can be filled with empty spectra to have a complete list of spectra for all the features.

# # fill spectra
# ms2_spectra_neg <- fillSpectra(mtbls1586_qf_neg, ms2_spectra_neg)
# ms2_spectra_pos <- fillSpectra(mtbls1586_qf_pos, ms2_spectra_pos)
# 
# # check namespace 
# checkIdNamespace(mtbls1586_qf_neg, ms2_spectra_neg)
# checkIdNamespace(mtbls1586_qf_pos, ms2_spectra_pos)

Reading of transformation file

The mass-difference network is calculated using the MetNet package. Here, all calculated mass-differences between all pairs of features are compared to a transformation-list, containing different biochemical reactions and their corresponding mass difference. Required columns are "group" and "mass". Additional columns as "formula" can be added. The transformation-list needs to be provided by the user. An example can be found in the system files.

transformations_file <- system.file("extdata/MTBLS1586/transformations_MTBLS1586.csv", 
                                    package = "MetClassNetR")

transformations <- read_csv(transformations_file, col_names = TRUE)

transformations[1:5,]

Quality check of input

ZZZ

Setup for networks

The master-list will contain all types of experimental networks that have been created in this workflow.

master <- list()

Analysis

Mass difference Network

The mass-difference network is calculated using the MetNet package. All mass-differences between all pairs of features are calculated and are compared to the transformation-list. This transformation-list contains different biochemical reactions and their corresponding mass difference. Only if the mass-difference between two features matches to a mass-difference in the transformation file, within a certain ppm-range, it will be displayed. Different to the original MetNet-functions, not only the name of the mass-difference but also the value itself will be displayed as output adjacency matrix. The adjacency matrix will be than transformed to a data.frame.

# Create mass difference network
massdiff <- qfeat_structural(x = mtbls1586_qf_pos,
                             assay_name = "mtbls1586_pos",
                             transformation = transformations,
#                             var = c("Name", "Formula difference", "mass"),
                             var = c("name", "formula", "mass"),
                             ppm = 10)

# Create data.frame of adjacency matrix
massdiff_df <- as.data.frame(massdiff) |> 
  filter(binary == 1)


massdiff_df[5:10,]

A summary of the current data-set on the present mass-difference distribution is provided using the summary_mz-function. This function stores the summary as data.frame, which is also plotted. Depending on the size of the data, it might be useful to filter the number of determined mass-differences, e.g. by 1000 counts.

# Summary & filter for counts e.g. higher 1000
#sum_mz_f <- mz_summary(massdiff, filter = 1000, var = "group")

# some visualizations
#mz_vis(sum_mz_f, var = "group")

# append mass difference network
master <- c(master, massdiff)

Add annotatation

A QFeatures-object stores various information on LC-MS/MS data. Also, annotations and possible database identifiers are present in a QFeatures-object. In order to add the annotations to the adjacency-list, qfeat_annotation is applied. It extracts the annotations, if any present, and adds them to the desired adjacency-list, for example the list from the mass-difference network. In order to check if the annotations match the mass-difference, a filter is applied so that only features remains that have annotations in both features.

# Add annotations to mass-difference adjacency matrix
rowData(massdiff) <- as.data.frame(rowData(mtbls1586_qf_pos[["mtbls1586_pos"]]))

# display annotation for the feature "Cluster_3201"
rowData(massdiff)["Cluster_3201", ]

massdiff_anno <- as.data.frame(rowData(massdiff))

# Filter for present annotations
anno_filter <- filter(massdiff_anno, 
                      massdiff_anno$metabolite_identification != "")
anno_filter[25:30,]

Correlation network

Correlation networks will be calculated using functionality from the MetNet package. qfeat_statistical uses the QFeatures-Object as input and extracts the feature intensity in order to calculate correlations from the statistical-function from MetNet. MetNet implemented various models that can be applied to calculate correlations, e.g. pearson, spearman, partial-pearson, and many more which might also be selected. After generating the correlation adjacency-matrix, it will also be transformed to a data.frame

# Correlation network using partial pearson and GGM correlation
corr <- qfeat_statistical(x = mtbls1586_qf_pos, 
                                assay_name = "mtbls1586_pos", 
                                model = c("pearson_partial", "ggm"),
                                na.omit = TRUE,
                                p.adjust = "BH")

# Create data.frame from correlation adjacency matrix 
corr_df <- as.data.frame(corr)
corr_df[25:30,]

master <- c(master, corr)

## plots to view the distribution of correlations 
corr_fl <- data.frame(coef = c(corr_df$pearson_partial_coef,
                              corr_df$ggm_coef),
                     group = c(rep("pearson_partial", nrow(corr_df)),
                               rep("ggm", nrow(corr_df)))
                     )

plot <- ggplot2.histogram(data=corr_fl, xName='coef',
                  groupName='group', legendPosition="right",
                  alpha=0.5, binwidth=0.01,
                  brewerPalette="Paired",
                  addMeanLine=TRUE,  meanLineSize=1)

plot_custom <-ggplot2.customize(plot, xtitle="Correlation coefficient", 
                                ytitle="Count",
                        showLegend=TRUE, axisLine=c(0.5, "solid", "black"),
                        addDensity=TRUE, removePanelBorder=TRUE, 
                        backgroundColor="white", 
                        mainTitle="Correlation coefficient")   

plot_custom

Homologue Series

ABC

homol <- qfeat_homol(x = mtbls1586_qf_pos,
                             assay_name = "mtbls1586_pos",
                      elements=c("C","H","O"), use_C=TRUE,
                      minmz=5,  maxmz=120,
                      minrt=-30,  maxrt=30,
                      ppm=TRUE,
                      mztol= 5,  rttol=30,
                      minlength=5,
                      mzfilter=FALSE,
                      #spar=.45,    R2=.98,
                      plotit=FALSE)

MS2 similarity network

Spectral similarity network is created using positive MS2 spectra.

tolerance = 0.005
ppm = 0

# filter ms2 spectra 20 % of intensity
filter <- TRUE
my_filter <- function(x) {
  x > max(x, na.rm = TRUE) / 10
}


ms2_spectra_pos <- filterIntensity(ms2_spectra_pos, my_filter)

## to speed up calculations: only use 50 spectra for spectral similarity
spect <- spec_molNetwork(ms2_spectra_pos[1:50],
                             MAPFUN = joinPeaksGnps,
                             methods = "gnps",
                             tolerance = tolerance,
                             ppm = ppm, 
                             type = "inner")


spect_adj <- addSpectralSimilarity(am_structural = massdiff, 
                                   ms2_similarity = spect)

# Create data.frame of adjacency matrix
spect_df <- as.data.frame(spect_adj) |> 
  filter(!is.na(gnps))

master <- c(master, spect_adj)

Export networks to gml format

Created networks are exported and saved as .gml file in the current working-directory. Feature Ids will be stored as additional name in node attributes. All the other information from the adjacency-lists will be stores as edge-attributes. It might also be defined which columns of the adjacency-list should be stored, by using the select option in the exportNet2gml-function.

## Mass-difference network to gml
# note: only first 10 features are exported
exportNet2gml(x = massdiff_df[1:10,], file = "mzdiff")

## Correlation network to gml
# note: only first 10 features are exported
exportNet2gml(x = pears_df[1:10,], file = "pears")

## Spectral similarity network2gml
# note: only first 10 features are exported
exportNet2gml(x = spect_df[1:10,], file = "similarity", 
              select = c("gnps"))

Additionally, attributes may be exported as .gml attribute-file using exportAttributes2gml. Edge attributes are prepared by merging the adjacency-lists from mass-difference and correlation-networks. Node attributes are generated by extraction additional information as annotation, mz, and RT values from the QFeatures-object.

# Create and export attribute file 
attributes <- merge(corr_df, massdiff_df, by = c("Row", "Col"))


# Prepare annotation file
feat_anno <- as.data.frame(rowData(mtbls1586_qf_pos[["mtbls1586_pos"]]))
feat_anno <- feat_anno[,c("metabolite_identification", "mz", "rtime")]
colnames(feat_anno) <- c("Annotation", "mz", "RT")


# note: only first 10 features are exported
exportAttributes2gml(attributes[1:10,], file = "attributes_mtbls1586_pos", 
                     anno = feat_anno)

Export of homologous series

# 
# el <- as_tibble(homol[["Peaks in homologue series"]]) |>
#     filter(`to ID` != "0") |> select (c("peak ID", "HS cluster",
#                                          "m/z increment", "RT increment"))
# 
# g <- el |> select(c("peak ID", "to ID")) |> as.matrix() |> graph_from_edgelist() |> as_data_frame()
# 
# 
# ## Write to File
# write_graph(g, "/tmp/g.gml", "gml")

MetClassNet/MetClassNetR documentation built on June 30, 2023, 2:12 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MetClassNet/MetClassNetR
Create and analize networks built from metabolomics Data

In MetClassNet/MetClassNetR: Create and analize networks built from metabolomics Data

Introduction

Setup

Load required libries

Parallelization

Reading of MS1 data

Reading of MS2 data

Sanity checks

Reading of transformation file

Quality check of input

Setup for networks

Analysis

Mass difference Network

Add annotatation

Correlation network

Homologue Series

MS2 similarity network

Export networks to gml format

R Package Documentation

Browse R Packages

We want your feedback!

MetClassNet/MetClassNetR Create and analize networks built from metabolomics Data

In MetClassNet/MetClassNetR: Create and analize networks built from metabolomics Data

Introduction

Setup

Load required libries

Parallelization

Reading of MS1 data

Reading of MS2 data

Sanity checks

Reading of transformation file

Quality check of input

Setup for networks

Analysis

Mass difference Network

Add annotatation

Correlation network

Homologue Series

MS2 similarity network

Export networks to gml format

R Package Documentation

Browse R Packages

We want your feedback!

MetClassNet/MetClassNetR
Create and analize networks built from metabolomics Data