Take the well-studied MAPK/ERK pathway as an example:
Here the flow of signal is Signal -> Ras -> Mek -> Erk -> Response. I downloaded a Map/Erk biochemical model and visualized the reaction graph. The reaction graph connects species (proteins, compounds, etc) to the reactions that involve the species.
Steps to producing reaction graph:
library(PathFlow) library(igraph) #Original preprocessing code, kept here for easy reference. #This code seems to have a bugin the "product.reactant.list" calculation onwards. #Convert data to reaction graph edge list #mapk.edges <- read.csv("/Users/robertness/Dropbox/works_in_progress/huang_mapk.csv", header = F, stringsAsFactors =F) #colnames(mapk.edges) <- c("from", "to") #edge.list <- mapk.edges[mapk.edges$to != "", ] #mapk.edges[, 2] <- gsub("-", "", gsub("reaction", "", mapk.edges[, 2] )) #product.reactant.list <- split(mapk.edges[, c(1, 3)], as.factor(mapk.edges[, 2])) #names(product.reactant.list$product) <- c("from", "to") #product.reactant.list$reactant <- product.reactant.list$reactant[, c(2, 1)] #names(product.reactant.list$reactant) <- c("from", "to") #edge.list <- as.matrix(do.call("rbind", product.reactant.list)) #rownames(edge.list) <- NULL #edge.list <- as.data.frame(edge.list) #save(edge.list, file="/Users/robertness/Dropbox/works_in_progress/huang_mapk_edgelist.Rdata") # #mapk.nodes <- read.csv("/Users/robertness/Dropbox/works_in_progress/mapknodes.csv") #mapk.nodes <- mapk.nodes[-c(23:26), c("name", "description")] #remove "norm" nodes without edges #rownames(mapk.nodes) <- mapk.nodes$name #save(mapk.nodes, file="/Users/robertness/Dropbox/works_in_progress/huang_mapk_vertexlist.Rdata") setwd("/Users/robertness/Dropbox/code/PathFlow") load("data/huang_mapk_vertexlist.Rdata") load("data/huang_mapk_edgelist.Rdata") source("R/PathFlow.R")
#data(huang_mapk_edgelist) #data(huang_mapk_vertexlist) g <- graph.data.frame(edge.list, directed = T, vertices = mapk.nodes) ## Prepare nodes for plotting ### Use the term "place" for chemical species and "transition" for reactions as is ### consistent with Petri net analysis V(g)$type <- "place" V(g)[grep("r", V(g)$name)]$type <- "transition" #grep the reaction nodes and call them "transitions" V(g)$shape <- "circle" V(g)[V(g)$type == "transition"]$shape <- "square" V(g)$color <- "lightblue" V(g)[V(g)$name == "E1"]$color <- "blue" V(g)[V(g)$name == "PP_K"]$color <- "red" V(g)$label <- NA #The following is a force-directed layout. #I generated the following coordinates using Fruchterman-Reingold layout, #then passed them to plot as a fixed argument for consistency nice.layout <- matrix(c(84, 112, 20, 181, 201, 244, 280, 271, 328, 385, 376, 295, 85, 104, 165, 234, 241, 365, 337, 283, 430, 326, 43, 40, 132, 145, 153, 61, 164, 176, 199, 292, 278, 255, 220, 209, 226, 309, 296, 305, 250, 262, 285, 345, 332, 309, 313, 322, 335, 399, 408, 375, 296, 177, 220, 259, 371, 337, 195, 81, 109, 161, 51, 308, 265, 200, 368, 247, 97, 195, 289, 410, 80, 0, 257, 281, 272, 225, 209, 181, 318, 319, 320, 359, 357, 368, 290, 285, 212, 252, 251, 334, 142, 141, 138, 52, 54, 38, 163, 170, 198, 100, 115, 89), ncol=2) plot.igraph(g, layout = nice.layout, vertex.size =3, vertex.label.dist=-.37, edge.arrow.size=.4 )
The dark blue node E1 is the source of signal. PP_K (doubly-phosphorylated Erk) is the response.
This reaction graph does not clearly show how signal flows from E1 to PP_K. Generally, reaction graphs do not provide a clear representation of signal flow. In the following, I describe how to manipulate the reaction graph so signal flow becomes clear.
One approach to this problem is to find the shortest path from source to response through the reaction graph. igraph makes this easy:
sp <- get.shortest.paths(g, from = "E1", to = "PP_K", output="epath")$epath[[1]] direct.graph <- suppressWarnings(subgraph.edges(g, sp)) V(direct.graph)$label <- V(direct.graph)$description V(direct.graph)$label.cex <- .8 new.layout <- matrix(c(20, 146, 279, 430, 89, 205, 350, 53, 115, 176, 239, 313, 387, 410, 266, 126, 0, 341, 191, 59, 375, 302, 229, 156, 92, 27), ncol = 2) plot.igraph(direct.graph, layout = new.layout, vertex.size =4, vertex.label.dist=-.37, edge.arrow.size=.4 )
This graph resembles the expected Signal -> Ras -> Mek -> Erk -> Response path. However, what about when there are multiple paths of signal flow? Longer paths can be just as important to overall signal flow as the shortest path. In such cases a different approach is needed.
Biologists consider the unphosphorylated form of a protein as "inactive", and the phosphorylated form as "active". Rather than having two graph nodes for the active and inactive forms of the protein, why not have one "super-node" representing a variable with two states; "active" and "inactive"?
To find which species can be combined into super-nodes, find the null space of the incidence matrix, the matrix of the reaction graph. This provides sets of species whose combined total is conserved, meaning the total does not change as the system changes. The positive elements from the null space vectors provide these sets, called place-invariants. The species in a place-invariant form a super-node.
Similarly, the null space of the transpose of the incidence matrix provides you with transition-invariants. These are sets of reactions that form repeating cycles. The state of the system right after one of these cycles occurs is the same as it was right before it occurred.
The reactions in a cycle (transition-invariant) connect to species, which are members of super-nodes. These connected super-nodes and cycles form subnetworks. Each subnetwork is a step in the flow of the signal.
Petri net theory calls this approach "invariant analysis". In the age of high-throughput bioinformatics, our knowledge about which proteins interact has grown far faster than our knowledge about the kinetics of those interactions. Invariant analysis provides insights into the workings reaction networks even when reaction rates are not available. The Sackman (2006) reference below has been my go-to source on this type of analysis in the context of signal transduction.
My code for finding invariants starts with finding the null space. Then I find minimal sets of nodes corresponding to positive elements in null space vectors. I use the Null function in the MASS package to calculate the null space. This is encoded in a function called getMinimalInvariants.
Now that I can find invariants, I start with the place-invariants.
S <- getPetriNetMatrices(g)$S pInvariants <- getMinimalInvariants(S)
Next, I can apply biological interpretation to each place-invariant. I can see here that each invariant corresponds to different levels of activity (phosphorylations, indicated by "P") for a given species. With that knowledge I can give each place-invariant a convenient name.
names(pInvariants) <- c("Mek-Erk", "Erk-Phosphatase", "Raf", "Raf-Mek", "Mek-Phosphatase", "Mek", "Erk") pInvariants
Moving on to the transition-invariants, see what happens when I plot both the place and transition-invariants together:
drawGraphWithInvariants(g, nice.layout)
Nodes not involved in any invariant are colored pink in the plot.
There is a problem with the transition-invariants in the plot. Many of the reactions belong to no invariant at all. This is wrong because signals only travel though reactions belonging to transition-invariants and species connected to those reactions.
The problem is the network contains reversible reactions. Many of the reactions in cell signaling are indeed reversible, for example two species can bind and then unbind again. The forward and reverse reactions cancel each other out, resulting in zero entries in the incidence matrix, affecting calculation of the null space.
In applying invariant analysis to signaling networks, this is remedied by removing the reverse reactions. These are conveniently labeled with the suffix "R" in the reaction graph. When the reverse reactions are removed, the transitions form transition-invariants:
reverse.nodes <- grepl("R", V(g)$name) subg <- suppressWarnings(subgraph(g, V(g)[!reverse.nodes])) drawGraphWithInvariants(subg, nice.layout[!reverse.nodes, ])
Looking more closely at the transition-invariants in this case:
places <- V(g)[V(g)$type == "place"]$name transitions <- V(g)[V(g)$type == "transition"]$name transitions.sub <- transitions[!grepl("R", transitions)] adjMat <- as.matrix(get.adjacency(subg)) pre <- adjMat[places, transitions.sub] post <- t(adjMat[transitions.sub, places]) S <- post - pre tInvariants <- getMinimalInvariants(t(S)) tInvariantsDescriptions <- lapply(tInvariants, function(tInv){ mapk.nodes[tInv, "description"]}) tInvariantsDescriptions
In the first cycle, once-phosphorylated Mek (P-MapKK) binds with phosphorylated Raf (P-MAPKKK), which leads to Mek taking on another phosphorylation. Then Mek phosphatase binds to the twice-phosphorylated Mek, leading to a dephosphorylation that takes Mek back to where it started. The other cycles have similar interpretations.
Since all the reactions in the shortest path fall in a transition-invariant, I can remove the reactions and just look at species:
spMat <- get.edgelist(g)[sp, ] transitions <- V(g)[V(g)$type == "transition"]$name transition.outgoing.mat <- spMat[spMat[, 1] %in% transitions, ] transition.outgoing.list <- as.list(transition.outgoing.mat[, 2]) names(transition.outgoing.list) <- transition.outgoing.mat[, 1] transition.incoming.mat <- spMat[spMat[, 2] %in% transitions, ] transition.incoming.list <- as.list(transition.incoming.mat[, 1]) names(transition.incoming.list) <- transition.incoming.mat[, 2] place.only.edges <- do.call("rbind", sapply(transitions, function(tr){ c(from = transition.incoming.list[[tr]], to = transition.outgoing.list[[tr]]) })) place.only.graph <- graph.edgelist(place.only.edges) simple.layout <- matrix(c(20, 82, 165, 228, 309, 381, 430, 410, 329, 290, 208, 163, 95, 0), ncol = 2) V(place.only.graph)$label.cex <- .8 plot.igraph(place.only.graph, layout=simple.layout, vertex.size =4, vertex.label.dist=-.37, edge.arrow.size=.4 )
Finally, I go from referring to species to referring to super-nodes (place-invariants).
invariant.edges <- cbind( sapply(place.only.edges[, 1], function(place){ out <- names(pInvariants)[sapply(pInvariants, function(inv) place %in% inv)] if(length(out) == 0) out <- place out }), sapply(place.only.edges[, 2], function(place){ out <- names(pInvariants)[sapply(pInvariants, function(inv) place %in% inv)] if(length(out) == 0) out <- place out }) ) invariant.graph <- graph.edgelist(invariant.edges) plot.igraph(invariant.graph, layout=simple.layout, vertex.size =4, vertex.label.dist=-.37, edge.arrow.size=.4 )
The best way to view the signal is to plot paths through subnetworks. The subnetworks are derived from invariant analysis.
Huang, Chi-Ying and James E. Ferrell. "Ultrasensitivity in the mitogen-activated protein kinase cascade." Proceedings of the National Academy of Sciences 93.19 (1996): 10078-10083.
Sackmann, Andrea, Monika Heiner, and Ina Koch. "Application of Petri net based analysis techniques to signal transduction pathways." BMC bioinformatics 7.1 (2006): 482.
Hardy, Simon and Ravi Iyengar. Analysis of Dynamic Models of Signaling Networks with Petri Nets and Dynamic Graphs. In Ina Koch, Wolfgang Reisig, and Falk Schreiber. Modeling in systems biology: the Petri Net approach. Vol. 16. Springer, (2010) (pp. 225-250).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.