In robertness/pathflow: Converts biochemical reaction networks to pathway maps.

How do you visualize signal flow through a signal transduction pathway?

Take the well-studied MAPK/ERK pathway as an example:

The signal starts when a ligand binds to a receptor.
The receptor binding activates Ras
Ras activates Raf (aka MapKKK)
Raf activates Mek (MapKK)
Mek activates Erk (MapK)
Erk activates the transcription of a gene

Here the flow of signal is Signal -> Ras -> Mek -> Erk -> Response. I downloaded a Map/Erk biochemical model and visualized the reaction graph. The reaction graph connects species (proteins, compounds, etc) to the reactions that involve the species.

Steps to producing reaction graph:

Download SBML format from biomodels.org
Import into Cytoscape using the plug-in Petriscape
Export csv files of network information
Import csv files into R
Produce tables of vertices and edges. Download directly: here and here
Plot in igraph

library(PathFlow)
library(igraph)
#Original preprocessing code, kept here for easy reference.
#This code seems to have a bugin the "product.reactant.list" calculation onwards.

#Convert data to reaction graph edge list
#mapk.edges <- read.csv("/Users/robertness/Dropbox/works_in_progress/huang_mapk.csv", header = F, stringsAsFactors =F)
#colnames(mapk.edges) <- c("from", "to")
#edge.list <- mapk.edges[mapk.edges$to != "", ]
#mapk.edges[, 2] <- gsub("-", "", gsub("reaction", "", mapk.edges[, 2] ))
#product.reactant.list <- split(mapk.edges[, c(1, 3)], as.factor(mapk.edges[, 2]))
#names(product.reactant.list$product) <- c("from", "to")
#product.reactant.list$reactant <- product.reactant.list$reactant[, c(2, 1)]
#names(product.reactant.list$reactant) <- c("from", "to")
#edge.list <- as.matrix(do.call("rbind", product.reactant.list))
#rownames(edge.list) <- NULL
#edge.list <- as.data.frame(edge.list)
#save(edge.list, file="/Users/robertness/Dropbox/works_in_progress/huang_mapk_edgelist.Rdata")
#
#mapk.nodes <- read.csv("/Users/robertness/Dropbox/works_in_progress/mapknodes.csv")
#mapk.nodes <- mapk.nodes[-c(23:26), c("name", "description")] #remove "norm" nodes without edges
#rownames(mapk.nodes) <- mapk.nodes$name
#save(mapk.nodes, file="/Users/robertness/Dropbox/works_in_progress/huang_mapk_vertexlist.Rdata")
setwd("/Users/robertness/Dropbox/code/PathFlow")
load("data/huang_mapk_vertexlist.Rdata")
load("data/huang_mapk_edgelist.Rdata")
source("R/PathFlow.R")

#data(huang_mapk_edgelist)
#data(huang_mapk_vertexlist)
g <- graph.data.frame(edge.list, directed = T, vertices =  mapk.nodes)
## Prepare nodes for plotting
### Use the term "place" for chemical species and "transition" for reactions as is 
### consistent with Petri net analysis
V(g)$type <- "place" 
V(g)[grep("r", V(g)$name)]$type <- "transition" 
#grep the reaction nodes and call them "transitions"
V(g)$shape <- "circle"
V(g)[V(g)$type == "transition"]$shape <- "square"
V(g)$color <- "lightblue"
V(g)[V(g)$name == "E1"]$color <- "blue"
V(g)[V(g)$name == "PP_K"]$color <- "red"
V(g)$label <- NA
#The following is a force-directed layout.
#I generated the following coordinates using Fruchterman-Reingold layout,
#then passed them to plot as a fixed argument for consistency
nice.layout <- matrix(c(84, 112, 20, 181, 201, 244, 280, 271, 328, 
                        385, 376, 295, 85, 104, 165, 234, 241, 365, 
                        337, 283, 430, 326, 43, 40, 132, 145, 153, 
                        61, 164, 176, 199, 292, 278, 255, 220, 209, 
                        226, 309, 296, 305, 250, 262, 285, 345, 332, 
                        309, 313, 322, 335, 399, 408, 375, 296, 177, 
                        220, 259, 371, 337, 195, 81, 109, 161, 51, 
                        308, 265, 200, 368, 247, 97, 195, 289, 410, 
                        80, 0, 257, 281, 272, 225, 209, 181, 318, 319, 
                        320, 359, 357, 368, 290, 285, 212, 252, 251, 
                        334, 142, 141, 138, 52, 54, 38, 163, 170, 198, 
                        100, 115, 89), ncol=2)
plot.igraph(g, layout = nice.layout, vertex.size =3, vertex.label.dist=-.37, 
            edge.arrow.size=.4 )

The dark blue node E1 is the source of signal. PP_K (doubly-phosphorylated Erk) is the response.

This reaction graph does not clearly show how signal flows from E1 to PP_K. Generally, reaction graphs do not provide a clear representation of signal flow. In the following, I describe how to manipulate the reaction graph so signal flow becomes clear.

Shortest Path Approach

One approach to this problem is to find the shortest path from source to response through the reaction graph. igraph makes this easy:

sp <- get.shortest.paths(g, from = "E1", to = "PP_K", output="epath")$epath[[1]]
direct.graph <- suppressWarnings(subgraph.edges(g, sp))
V(direct.graph)$label <- V(direct.graph)$description
V(direct.graph)$label.cex <- .8
new.layout <- matrix(c(20, 146, 279, 430, 89, 205, 350, 53,
                       115, 176, 239, 313, 387, 410, 266, 126, 0, 
                       341, 191, 59, 375, 302, 229, 156, 92, 27), ncol = 2)
plot.igraph(direct.graph, layout = new.layout, vertex.size =4, 
            vertex.label.dist=-.37, edge.arrow.size=.4 )

This graph resembles the expected Signal -> Ras -> Mek -> Erk -> Response path. However, what about when there are multiple paths of signal flow? Longer paths can be just as important to overall signal flow as the shortest path. In such cases a different approach is needed.

Subnetwork Approach

Biologists consider the unphosphorylated form of a protein as "inactive", and the phosphorylated form as "active". Rather than having two graph nodes for the active and inactive forms of the protein, why not have one "super-node" representing a variable with two states; "active" and "inactive"?

To find which species can be combined into super-nodes, find the null space of the incidence matrix, the matrix of the reaction graph. This provides sets of species whose combined total is conserved, meaning the total does not change as the system changes. The positive elements from the null space vectors provide these sets, called place-invariants. The species in a place-invariant form a super-node.

Similarly, the null space of the transpose of the incidence matrix provides you with transition-invariants. These are sets of reactions that form repeating cycles. The state of the system right after one of these cycles occurs is the same as it was right before it occurred.

The reactions in a cycle (transition-invariant) connect to species, which are members of super-nodes. These connected super-nodes and cycles form subnetworks. Each subnetwork is a step in the flow of the signal.

Petri net theory calls this approach "invariant analysis". In the age of high-throughput bioinformatics, our knowledge about which proteins interact has grown far faster than our knowledge about the kinetics of those interactions. Invariant analysis provides insights into the workings reaction networks even when reaction rates are not available. The Sackman (2006) reference below has been my go-to source on this type of analysis in the context of signal transduction.

My code for finding invariants starts with finding the null space. Then I find minimal sets of nodes corresponding to positive elements in null space vectors. I use the Null function in the MASS package to calculate the null space. This is encoded in a function called getMinimalInvariants.

Now that I can find invariants, I start with the place-invariants.

S <- getPetriNetMatrices(g)$S
pInvariants <- getMinimalInvariants(S)

Next, I can apply biological interpretation to each place-invariant. I can see here that each invariant corresponds to different levels of activity (phosphorylations, indicated by "P") for a given species. With that knowledge I can give each place-invariant a convenient name.

names(pInvariants) <- c("Mek-Erk", "Erk-Phosphatase", "Raf", "Raf-Mek", 
                        "Mek-Phosphatase", "Mek", "Erk")
pInvariants

Moving on to the transition-invariants, see what happens when I plot both the place and transition-invariants together:

drawGraphWithInvariants(g, nice.layout)

Nodes not involved in any invariant are colored pink in the plot.

There is a problem with the transition-invariants in the plot. Many of the reactions belong to no invariant at all. This is wrong because signals only travel though reactions belonging to transition-invariants and species connected to those reactions.

The problem is the network contains reversible reactions. Many of the reactions in cell signaling are indeed reversible, for example two species can bind and then unbind again. The forward and reverse reactions cancel each other out, resulting in zero entries in the incidence matrix, affecting calculation of the null space.

In applying invariant analysis to signaling networks, this is remedied by removing the reverse reactions. These are conveniently labeled with the suffix "R" in the reaction graph. When the reverse reactions are removed, the transitions form transition-invariants:

reverse.nodes <- grepl("R", V(g)$name)
subg <- suppressWarnings(subgraph(g, V(g)[!reverse.nodes])) 
drawGraphWithInvariants(subg, nice.layout[!reverse.nodes, ])

Looking more closely at the transition-invariants in this case:

places <- V(g)[V(g)$type == "place"]$name
transitions <- V(g)[V(g)$type == "transition"]$name
transitions.sub <- transitions[!grepl("R", transitions)]
adjMat <- as.matrix(get.adjacency(subg))
pre <- adjMat[places, transitions.sub]
post <- t(adjMat[transitions.sub, places])
S <- post - pre
tInvariants <- getMinimalInvariants(t(S))
tInvariantsDescriptions <- lapply(tInvariants, function(tInv){
  mapk.nodes[tInv, "description"]})
tInvariantsDescriptions

In the first cycle, once-phosphorylated Mek (P-MapKK) binds with phosphorylated Raf (P-MAPKKK), which leads to Mek taking on another phosphorylation. Then Mek phosphatase binds to the twice-phosphorylated Mek, leading to a dephosphorylation that takes Mek back to where it started. The other cycles have similar interpretations.

Combining Shortest Path with Subnetwork Approach

Since all the reactions in the shortest path fall in a transition-invariant, I can remove the reactions and just look at species:

spMat <- get.edgelist(g)[sp, ]
transitions <- V(g)[V(g)$type == "transition"]$name
transition.outgoing.mat <- spMat[spMat[, 1] %in% transitions, ]
transition.outgoing.list <- as.list(transition.outgoing.mat[, 2])
names(transition.outgoing.list) <- transition.outgoing.mat[, 1]
transition.incoming.mat <- spMat[spMat[, 2] %in% transitions, ]
transition.incoming.list <- as.list(transition.incoming.mat[, 1])
names(transition.incoming.list) <- transition.incoming.mat[, 2]
place.only.edges <- do.call("rbind", sapply(transitions, function(tr){
   c(from = transition.incoming.list[[tr]], to = transition.outgoing.list[[tr]])
}))
place.only.graph <- graph.edgelist(place.only.edges)
simple.layout <- matrix(c(20, 82, 165, 228, 309, 381, 430, 
                          410, 329, 290, 208, 163, 95, 0), ncol = 2)
V(place.only.graph)$label.cex <- .8
plot.igraph(place.only.graph, layout=simple.layout,
            vertex.size =4, vertex.label.dist=-.37, 
            edge.arrow.size=.4 )

Finally, I go from referring to species to referring to super-nodes (place-invariants).

invariant.edges <- cbind(
  sapply(place.only.edges[, 1], function(place){
    out <- names(pInvariants)[sapply(pInvariants, function(inv) place %in% inv)]
    if(length(out) == 0) out <- place
    out
  }),
  sapply(place.only.edges[, 2], function(place){
    out <- names(pInvariants)[sapply(pInvariants, function(inv) place %in% inv)]
    if(length(out) == 0) out <- place
    out
  })
)
invariant.graph <- graph.edgelist(invariant.edges)
plot.igraph(invariant.graph, layout=simple.layout,
            vertex.size =4, vertex.label.dist=-.37, 
            edge.arrow.size=.4 )

Conclusion

The best way to view the signal is to plot paths through subnetworks. The subnetworks are derived from invariant analysis.

References

Huang, Chi-Ying and James E. Ferrell. "Ultrasensitivity in the mitogen-activated protein kinase cascade." Proceedings of the National Academy of Sciences 93.19 (1996): 10078-10083.

Sackmann, Andrea, Monika Heiner, and Ina Koch. "Application of Petri net based analysis techniques to signal transduction pathways." BMC bioinformatics 7.1 (2006): 482.

Hardy, Simon and Ravi Iyengar. Analysis of Dynamic Models of Signaling Networks with Petri Nets and Dynamic Graphs. In Ina Koch, Wolfgang Reisig, and Falk Schreiber. Modeling in systems biology: the Petri Net approach. Vol. 16. Springer, (2010) (pp. 225-250).

robertness/pathflow documentation built on May 27, 2019, 10:33 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com