In robertness/pathflow: Converts biochemical reaction networks to pathway maps.

Processing data from Computational Model of EGF Signaling

Background

Schoeberl et al. 2002 published a computational model of epidermal growth factor (EGF) receptor signal pathways. Represented as a biochemical network:

schoeberl_model

In subsequent analysis, I plan to convert network format to a causal influence diagram using Petri net analysis.

Before that can happen, the data must be preprocessed. Most biochemical network analysis packages in R fail because, even when the model is in a standard format such as SBML, there are too many exceptions to account for. For that reason, this preprocessing workflow is included here.

#devtools::install_github("osazuwa223/igraphr")
#devtools::install_github("osazuwa223/PathFlow")
library(PathFlow)
setwd("/Users/robertness/Dropbox/code/PathFlow")
raw.data <- read.csv("inst/extdata/schoeberl.csv",
                     row.names = NULL, stringsAsFactors = F)
library(dplyr)

This is a slightly modified variant of the dataset downloaded directed from Nature. The raw data contains only reactions with some annotations.

head(raw.data)

The goal here is to convert this to a bipartite graph of species (the reactants and products of a reaction) and the reactions that connect them.

For example, the reaction r1 is written as "A + B -> AB", this becomes edges "A -> r1", "B -> r1", and "r1 -> AB". However, instead of calling the reaction "r1", the original character string "A + B -> AB" is used to label the reaction. This enables sanity checks and easy interpretation of computation on the resulting graph.

Working with the Equation column, I first separate out the reversable reations.

reversable <- grepl("<-->", raw.data$Equation)
# Pull reversables out. ---------------------------
reversables <- raw.data[reversable, ]
raw.data <- raw.data[!reversable, ]
# Convert reversables into forward and backward versions ---------------------------
backwards <- forwards <- reversables 
forwards$Equation <- gsub("<-->", "->", forwards$Equation)
backwards$Equation <- gsub("<-->", "<-", backwards$Equation)
#Add them back to the data ---------------------------
raw.data <- rbind(raw.data, forwards, backwards)

Now that reversable reactions have be seperated out, we need to create an edge list. This involves mapping reactions to edges. Each reaction processes seperately into a data frame of edges and the final result will collapse all the data frames into one. The original equation value is preserved on each edge as an index.

# A function for an individual reaction ---------------------------
#use a function here to distingish "->" and "<-"
edge.list <- do.call("rbind", lapply(raw.data$Equation, parseBipartiteEdges))
rownames(edge.list) <- NULL

For the bipartite graph, the raw data can be seen as a table of vertex information for the reaction vertices. So I expand the table to include the reactants and products. The vertex information for the products will just be NA. Vertices are labeled as either species or reactions.

all.nodes <- unique(data.frame(v = c(edge.list$from, edge.list$to)))
all.nodes$type <- "species"
all.nodes$type[all.nodes$v %in% raw.data$Equation] <- "reaction"
vertex.list <- merge(all.nodes, raw.data,
      by.x = "v", 
      by.y = "Equation", all = T, stringsAsFactors = F) 
names(vertex.list) <- c("v", "type","reaction_number", "param_info1", 
                        "param_info2", "references_remarks")
vertex.list$v <- as.character(vertex.list$v)

Finally, the graph object is created.

# Turn the NA's in the vertex list to "NA" (characer) 
# Or else indexing becomes cumbersome. ---------------------------
for(item in names(vertex.list)){
  vertex.list[, item][is.na(vertex.list[, item])] <- "NA"
} 
schoeberl.graph <- graph.data.frame(edge.list, directed = T, vertices = vertex.list)
g <- schoeberl.graph
V(g)$label <- NA
plot.igraph(g, vertex.size = 5, edge.arrow.size=0.3)
#save(schoeberl.graph, file = "data/schoeberl.Rdata")

Removing the Reversable Reactions.

In subsequent analysis I use the incidence matrix generated from this biochemical network. Reversable transactions cancel out elements in the incidence matrix to 0.

Since my interest is in mapping signal flow through the network, I can remove the reactions corresponding to the reversing of transistions. Given the original notation was A <--> B, I have no way of knowing which is a reaction on the path of signal flow, and which is mearly the reversal of the transaction. I do this using graph traversal over an igraph object, using functionality in the lucy package.

The pseudocode breaks down as follows:

Walk backward from the output node.
When we arrive at reaction node.
If there is no node that is the reverse reaction of the reaction node, do nothing.
If the reaction node is not downstream of the input node, do nothing.
Else:
1. Get the shortest paths from each of the doppleganger nodes, to the output node.
2. Remove the node with the longest path. Tie-break by keeping the forward-facing reaction node.*
Get the set of upstream reaction nodes relative to the current reaction node.
Subset that set by those downstream of the input node.
If that subset is empty, stop, return the modified graph.
Else, apply the algorithm recursively to each node in the subset.

V(g)$updated <- FALSE
V(g)$is.input <- FALSE
V(g)[reaction_number == "v1"]$is.input <- TRUE
V(g)$is.output <- FALSE
V(g)["[ERK-PP]"]$is.output <- TRUE
V(g)[is.output]$updated <- TRUE

# We need to trace a path back to the input 
# callBack only affects reaction nodes.  so just returns the graph on non-reaction nodes 
# In the c


v <- "[ERK-P-MEK-PP]  ->  [ERK-PP]+[MEK-PP]"

library(lucy)
updateVertices(g, getDeterminers = getDownstreamReactions, callback = makeOneWay)

forward.params <- unlist(lapply(raw.data$param_info1, 
                                function(val) as.numeric(
                                  strsplit(
                                    strsplit(
                                      strsplit(
                                        val, "=")[[1]][2], " "
                                      )[[1]][1], ";"
                                    )[[1]][1]
                                  )
                                )
                         )
backward.params <- unlist(lapply(raw.data$param_info2, 
                                function(val) as.numeric(
                                  strsplit(
                                    strsplit(
                                      strsplit(
                                        val, "=")[[1]][2], " "
                                      )[[1]][1], ";"
                                    )[[1]][1]
                                  )
                                )
                         )