Schoeberl et al. 2002 published a computational model of epidermal growth factor (EGF) receptor signal pathways. Represented as a biochemical network:
In subsequent analysis, I plan to convert network format to a causal influence diagram using Petri net analysis.
Before that can happen, the data must be preprocessed. Most biochemical network analysis packages in R fail because, even when the model is in a standard format such as SBML, there are too many exceptions to account for. For that reason, this preprocessing workflow is included here.
#devtools::install_github("osazuwa223/igraphr") #devtools::install_github("osazuwa223/PathFlow") library(PathFlow) setwd("/Users/robertness/Dropbox/code/PathFlow") raw.data <- read.csv("inst/extdata/schoeberl.csv", row.names = NULL, stringsAsFactors = F) library(dplyr)
This is a slightly modified variant of the dataset downloaded directed from Nature. The raw data contains only reactions with some annotations.
head(raw.data)
The goal here is to convert this to a bipartite graph of species (the reactants and products of a reaction) and the reactions that connect them.
For example, the reaction r1 is written as "A + B -> AB", this becomes edges "A -> r1", "B -> r1", and "r1 -> AB". However, instead of calling the reaction "r1", the original character string "A + B -> AB" is used to label the reaction. This enables sanity checks and easy interpretation of computation on the resulting graph.
Working with the Equation column, I first separate out the reversable reations.
reversable <- grepl("<-->", raw.data$Equation) # Pull reversables out. --------------------------- reversables <- raw.data[reversable, ] raw.data <- raw.data[!reversable, ] # Convert reversables into forward and backward versions --------------------------- backwards <- forwards <- reversables forwards$Equation <- gsub("<-->", "->", forwards$Equation) backwards$Equation <- gsub("<-->", "<-", backwards$Equation) #Add them back to the data --------------------------- raw.data <- rbind(raw.data, forwards, backwards)
Now that reversable reactions have be seperated out, we need to create an edge list. This involves mapping reactions to edges. Each reaction processes seperately into a data frame of edges and the final result will collapse all the data frames into one. The original equation value is preserved on each edge as an index.
# A function for an individual reaction --------------------------- #use a function here to distingish "->" and "<-" edge.list <- do.call("rbind", lapply(raw.data$Equation, parseBipartiteEdges)) rownames(edge.list) <- NULL
For the bipartite graph, the raw data can be seen as a table of vertex information for the reaction vertices. So I expand the table to include the reactants and products. The vertex information for the products will just be NA. Vertices are labeled as either species or reactions.
all.nodes <- unique(data.frame(v = c(edge.list$from, edge.list$to))) all.nodes$type <- "species" all.nodes$type[all.nodes$v %in% raw.data$Equation] <- "reaction" vertex.list <- merge(all.nodes, raw.data, by.x = "v", by.y = "Equation", all = T, stringsAsFactors = F) names(vertex.list) <- c("v", "type","reaction_number", "param_info1", "param_info2", "references_remarks") vertex.list$v <- as.character(vertex.list$v)
Finally, the graph object is created.
# Turn the NA's in the vertex list to "NA" (characer) # Or else indexing becomes cumbersome. --------------------------- for(item in names(vertex.list)){ vertex.list[, item][is.na(vertex.list[, item])] <- "NA" } schoeberl.graph <- graph.data.frame(edge.list, directed = T, vertices = vertex.list) g <- schoeberl.graph V(g)$label <- NA plot.igraph(g, vertex.size = 5, edge.arrow.size=0.3) #save(schoeberl.graph, file = "data/schoeberl.Rdata")
In subsequent analysis I use the incidence matrix generated from this biochemical network. Reversable transactions cancel out elements in the incidence matrix to 0.
Since my interest is in mapping signal flow through the network, I can remove the reactions corresponding to the reversing of transistions. Given the original notation was A <--> B, I have no way of knowing which is a reaction on the path of signal flow, and which is mearly the reversal of the transaction. I do this using graph traversal over an igraph object, using functionality in the lucy package.
The pseudocode breaks down as follows:
V(g)$updated <- FALSE V(g)$is.input <- FALSE V(g)[reaction_number == "v1"]$is.input <- TRUE V(g)$is.output <- FALSE V(g)["[ERK-PP]"]$is.output <- TRUE V(g)[is.output]$updated <- TRUE
# We need to trace a path back to the input # callBack only affects reaction nodes. so just returns the graph on non-reaction nodes # In the c v <- "[ERK-P-MEK-PP] -> [ERK-PP]+[MEK-PP]" library(lucy) updateVertices(g, getDeterminers = getDownstreamReactions, callback = makeOneWay)
forward.params <- unlist(lapply(raw.data$param_info1, function(val) as.numeric( strsplit( strsplit( strsplit( val, "=")[[1]][2], " " )[[1]][1], ";" )[[1]][1] ) ) ) backward.params <- unlist(lapply(raw.data$param_info2, function(val) as.numeric( strsplit( strsplit( strsplit( val, "=")[[1]][2], " " )[[1]][1], ";" )[[1]][1] ) ) )
Schoeberl, Birgit, et al. "Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors." Nature biotechnology 20.4 (2002): 370-375.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.