knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(pmd)
data("spmeinvivo")

PMD based Reactomics tried to evaluate untargeted HRMS profiles at reaction level. Reactomics is the extension of structure/reaction directed analysis for HRMS data and PMDs are treated as relationship unit for further discussion as compounds for metabolomics. Reaction level evaluation could be found in PMD network analysis, Source appointment and biomarker reaction discovery.

PMD network analysis

In untargeted metabolites profiles from HRMS, two ions or peaks could be treated together as long as they could be connected by relationship. Regular untargeted workflow prefers intensity correlation between compounds. However, PMD could also be the chemical bridge between two compounds or ions. For example, oxidation would add oxygen atom to the parent compound and introduce a PMD 15.995 Da. Meanwhile, One peak or compound could be involved in multiple reactions. In this case, we could build PMD network for certain ion or compound.

However, we need to the PMDs list to build such network for either one compound or one sample. One way is using the high frequency PMDs list from previous reported reactions and another way is using the high frequency PMDs within certain data set such as KEGG or HMDB. The former way try to focus on known reactions such as Phase I reactions for exogenous compounds while the latter way is useful to explorer new reactions or unknown reaction patterns within the data set. The latter way is actually the structure/reaction directed analysis.

PMD network analysis is the analysis to check or explorer the PMD relationship among co-existed ions from one sample or multiple samples. The edge between ions in the network means certain PMD relationship are valid in this data set. Meanwhile, the intensity correlation between paired ions could also be considered to connect the vertices. Such network is build based on local recursive search for all possible connections.

PMD network analysis is useful to screen metabolites. Regular metabolites discovery method try to predict metabolites' MS2 spectra and then match the data. In PMD network analysis, metabolites are predicted by high frequency PMD or preferred PMD list within the MS1 data and such prediction could be extended to the metabolites of metabolites. Such PMD search will stop when no new metabolites could be connected to the network. Such searching method is much easier and quick to get the overview of metabolites networks. Identification could follow the discovery from MS1 data.

If you have a specific compound and want to check the metabolites of certain PMD, you could use getchain to extract the network of that compounds

library(igraph)
# check metabolites of C18H39NO
# Use common PMDs for biological reactions
chain <- getchain(spmeinvivo,diff = c(2.02,14.02,15.99,58.04,13.98),mass = 286.3101,digits = 2,corcutoff = 0)
# show as network
net <- graph_from_data_frame(chain$sdac,directed = F)
pal <- grDevices::rainbow(5)
plot(net,vertex.label=round(as.numeric(V(net)$name),2),vertex.size =5,edge.width = 3,edge.color = pal[as.numeric(as.factor(E(net)$diff2))],vertex.label.dist=1,vertex.color=ifelse(round(as.numeric(V(net)$name),4) %in% 286.3101,'red','black'), main = 'PMD network')
legend("topright",bty = "n",
       legend=unique(E(net)$diff2),
       fill=unique(pal[as.numeric(as.factor(E(net)$diff2))]), border=NA,horiz = F)
# Consider the correlation coefficient cutoff
chain <- getchain(spmeinvivo,diff = c(2.02,14.02,15.99,58.04,13.98),mass = 286.3101,digits = 2,corcutoff = 0.6)
# show as network
net <- graph_from_data_frame(chain$sdac,directed = F)
pal <- grDevices::rainbow(5)
plot(net,vertex.label=round(as.numeric(V(net)$name),2),vertex.size =5,edge.width = 3,edge.color = pal[as.numeric(as.factor(E(net)$diff2))],vertex.label.dist=1,vertex.color=ifelse(round(as.numeric(V(net)$name),4) %in% 286.3101,'red','black'), main = 'PMD network')
legend("topright",bty = "n",
       legend=unique(E(net)$diff2),
       fill=unique(pal[as.numeric(as.factor(E(net)$diff2))]), border=NA,horiz = F)

Here only three PMD relationship could be found for C18H39NO. The duplicate edges between two vertices or self-loop edges mean isomer related PMD reactions. If we consider the correlation, the network would be trimmed. Since reaction might not always involve correlation, PMD network analysis could found more potential metabolites.

If you want to see all the independent peaks' high frequency PMDs as networks for certain sample, the following code will help. This part will use the high frequency PMDs cutoff 12 from the data to build the networks for all the independent peaks.

std <- globalstd(spmeinvivo,sda = F)
sda <- getsda(std,freqcutoff = 12)
df <- sda$sda
net <- graph_from_data_frame(df,directed = F)
pal <- grDevices::rainbow(length(unique(E(net)$diff2)))
plot(net,vertex.label=NA,vertex.size = 5,edge.width = 3,edge.color = pal[as.numeric(as.factor(E(net)$diff2))],main = 'PMD network')
legend("topright",bty = "n",
       legend=unique(E(net)$diff2),
       fill=unique(pal[as.numeric(as.factor(E(net)$diff2))]), border=NA,horiz = F)

Here we could find clusters of metabolites. We could detect such network community structure.

# network community structure detection
ceb <- cluster_edge_betweenness(net,weights = abs(E(net)$cor), directed = F) 
plot(ceb, net,vertex.label=NA,vertex.size = 5,edge.width = 3,) 
# output membership
head(cbind(ceb$membership,ceb$names))

Such network could also be build on correlation directed analysis which use correlation between paired peaks to build network.

cbp <- enviGCMS::getfilter(std,rowindex = std$stdmassindex)
cda <- getcda(cbp)
df <- cda$cda
# filter based on retention time differences larger than 2 mins
df <- df[df$diffrt>120,]
netc <- graph_from_data_frame(df,directed = F)
plot(netc,vertex.label=NA,vertex.size = 5,edge.width = 3,main = 'Correlation network')

As shown above, correlation network without PMD might merge into one big network, which lose the details of chemical reactions.

Shiny application

The PMD network for certain compound could be generated by run runPMDnet().

Source appointment

Peaks from samples could be from endogenous compounds or exogenous compounds. However, it's hard to tell for untargeted analysis. In terms of PMD, if one peak belongs to a high frequency PMD network, it means a relatively high activity. If such sample belongs to a biological specimen, it might be endogenous compound. If a peak show no PMD network with other peaks, the biological system might not have enzyme to make reaction happen. Exogenous compounds will show a lower degree since they are xenobiotics. Since most of the peaks will show a low degree, the median of the degree could be used as cutoff. Then we could make source appointment if the assumption is hold.

deg <- degree(net, mode = 'all')
median(deg)
endogenous <- names(deg)[deg>median(deg)]
exogenous <- names(deg)[deg<=median(deg)]

In this case, we will have r length(endogenous) endogenous compounds while r length(exogenous) exogenous compounds. When you find a peak show differences between groups, you could check the degree to infer its sources.

Another parameter would be the average network distances. Endogenous compounds could form a larger network with long average network distances while exogenous compounds will connected to network with small average network distances. Such parameter could be used to determine the source of unknown compound by checking the average network distances of the compounds PMD network.

Be careful, one compound could be endogenous for one sample while exogenous for another sample. In this case, PMD network would give hints on the sources based on the context of the samples.

Biomarker reaction

Biomarker always means biomarker compounds. However, if we could quantify the reaction relationship, we could use biomarker reaction to trace certain biological process. You could use getreact to extract the ion pairs shared the same PMD and intensity ratio RSD% lower than certain cutoff. Then the sum of the intensity of all PMD pairs' ions could be used to compare the reaction level changes among samples.

pmd <- getreact(spmeinvivo,pmd=15.99)
# show the ions with the same PMD
head(pmd$pmd)
# show the corresponding quantitative PMD data across samples, each row show the sum of intensity of paired masses qualified for stable mass pairs
head(pmd$pmddata)

If your data don't have retention time, reaction level change can also be checked.

spmeinvivo$rt <- NULL
pmd <- getreact(spmeinvivo,pmd=15.99)
# show the ions with the same PMD
head(pmd$pmd)
# show the corresponding quantitative PMD data across samples, each row show the sum of intensity of paired masses qualified for stable mass pairs
head(pmd$pmddata)

Now we have two methods to compute the quantitative PMD responses and user should select method depending on research purposes. 'static' will only consider the stable mass pairs across samples and such reactions will be limited by the enzyme or other factors than substrates. 'dynamic' will consider the unstable paired masses by normalization the relatively unstable peak with stable peak between paired masses and such reactions will be limited by one or both peaks in the paired masses.

data("spmeinvivo")
pmd <- getreact(spmeinvivo,pmd=15.99,method = 'dynamic')
# show the ions with the same PMD
head(pmd$pmd)
# show the corresponding quantitative PMD data across samples, each row show the sum of intensity of paired masses qualified for stable mass pairs
head(pmd$pmddata)

You can also output the quantitative results of all high frequency PMDs existing in the data.

data("spmeinvivo")
# remove redundant peaks
list <- globalstd(spmeinvivo,sda = T)
newlist <- enviGCMS::getfilter(list,rowindex = list$stdmassindex)
# get high frequency pmd
hfpmd <- unique(newlist$sda$diff2)
# generate quantitative results
pmd <- getreact(newlist,pmd=hfpmd)
# output the kegg pmd in the data
table(pmd$pmd$diff2)
# output quantitative result for each PMD
head(pmd$pmddata)
# output quantitative result for unique PMD
upmd <- aggregate(pmd$pmddata, by=list(pmd$pmd$diff2),sum)
# column for samples and row for unique PMD
head(upmd)

You can also output the quantitative results of all PMDs existing in current KEGG database.

# output all existing PMD in KEGG
keggpmd <- unique(round(keggrall$pmd,2))
data("spmeinvivo")
# remove redundant peaks
list <- globalstd(spmeinvivo)
newlist <- enviGCMS::getfilter(list,rowindex = list$stdmassindex)
# generate quantitative results
pmd <- getreact(newlist,pmd=keggpmd)
# output the kegg pmd in the data
table(pmd$pmd$diff2)
# output quantitative result for each PMD
head(pmd$pmddata)
# output quantitative result for unique PMD
upmd <- aggregate(pmd$pmddata, by=list(pmd$pmd$diff2),sum)
# column for samples and row for unique PMD
head(upmd)

PMD Reaction Database

To check the pmd reaction database:

# all reaction
data("omics")
head(omics)
# kegg reaction
data("keggrall")
head(keggrall)
# literature reaction for mass spectrometry
data("sda")
head(sda)

To check the HMDB pmd database:

data("hmdb")
head(hmdb)

To extract any compound KEGG compound's pmd network with known PMD:

plotcn('C6H12O6','Glucose',c(2.016,14.016,15.995))


yufree/pmd documentation built on April 19, 2024, 9:31 a.m.