The purpose of this package is to allow the user to find pairs of annotations that are enriched in a network. For instance, the network might be a gene regulatory network, where the nodes represent genes and the edges represent regulatory interactions between pairs of genes. Each gene might have one or more functional annotation labels (such as GO terms) associated with them. The user might be interested in learning whether a specific GO term is enriched upstream of a second GO term. This function works with directed networks, with and without edge weights. The results can be depicted as either a network or a heatmap.
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) knitr::opts_chunk$set(fig.width=8, fig.height=8)
When no edge weights are considered, enrichment between pairs of genes is determined by the binomial test.
When a gene network contains edge weights, we calculate the sum of the edge weights of each edge type, and we would like to know whether this value is higher than would be expected by chance. For two functional annotations $a$ and $b$, let us define $z_{a,b}$ as the sum of the edge weights of edge type $(a,b)$ in the network. Let us say that $c_{a,b}$ is the count of the number of edges of that type. $P(c_{a,b}=i)$ is the probability of observing exactly $i$ edges of type $(a,b)$ and $P(x \geq z_{a,b} | c_{a,b}=i)$ is the probability of observing a sum of edge weights greater than $z_{a.b}$ given that $c_{a,b}=i$. The probability of observing at least $z_{a,b}$ is: % a weighted average of the probability of observing at least $z_{a,b}$ if there were exactly $c_{a,b}$ edges of edge type $(a,b)$ for all $c_{a,b}$ from $1$ to $N$ (where $N$ is the number of edges of the network). \begin{equation} \label{eq:summation} P(x \geq z_{a,b})=\sum_{i=1}^N P(c_{a,b}=i) P(x \geq z_{a,b} | c_{a,b}=i) \end{equation}
where $N$ is the number of edges in the network. Note that $P(x \geq z_{a,b})$ is the p-value.
First, let us construct a random network with 300 nodes, 1000 edges and 14 GO terms.
library(PAFway) set.seed(123) #Make 300 nodes nodes=paste("node", c(1:300)) #First 3 node names: print(nodes[1:3]) #Assign them random GO terms randomGO=c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N")[sample(c(1:14), 300, replace=T)] names(randomGO)=nodes #First 3 nodes and associated GO terms: print(randomGO[1:3]) #Make 1000 edges edgesRandom=t(sapply(c(1:1000), function(i){ nodes[sample(300, 2)] })) #First 3 edges: print(edgesRandom[1:3,])
We can also consider a random network, where each gene can have more than one functional annotation:
#Assign each node a second GO term, separated by a '_' symbol randomGO2=c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N")[sample(c(1:14), 300, replace=T)] randomGO_multiple=sapply(c(1:300), function(i){paste(randomGO[i], randomGO2[i], sep="_")}) names(randomGO_multiple)=nodes #print first 5 elements, as a demo print(randomGO_multiple[1:5])
We can also select a sub-set of GO terms which we consider 'interesting':
#Interesting GO terms: GO_interesting=c("B", "C", "D", "F")
PAFway works with and without edge weights. Here we generate random edge weights:
random_edge_weights=rnorm(length(edgesRandom[,1]), 1, 0.001) print(random_edge_weights[1:5]) print(length(random_edge_weights))
Now that we've set up some networks, we can see whether there are any enriched pairwise relationships between edges.
First, we will use the main pafway function to find p-values for each function annotation pair. We will start off with the simplest example, which doesn't use any optional parameters:
#This will run pafway, with no edge weights, for all the GO terms a=pafway(randomGO, edgesRandom, unique(randomGO)) print(a[1:5, 1:5])
This can be displayed as either a heatmap or a network:
draw_network(a) draw_heatmap(a)
However, in this example, we do not correct for multiple hypothesis testing, so we get quite a few false positives, even though we know that we have started with a random network. Let us re-draw these, after adjustment for multiple hypotheses. We now see that no edges have a p-value<0.05.
draw_network(a, adjMethod = "bonferroni") draw_heatmap(a, adjMethod = "bonferroni")
Next, we'll repeat the previous analysis, but with edge weights. We will start off with the simplest example, which doesn't use any optional parameters:
#This will run pafway, with no edge weights, for all the GO terms b=pafway_edge_weights(randomGO, cbind(edgesRandom, random_edge_weights), unique(randomGO)) print(b[1:5, 1:5])
This can be displayed as either a heatmap or a network:
draw_network(b) draw_heatmap(b)
However, in this example, we do not correct for multiple hypothesis testing, so we get quite a few false positives, even though we know that we have started with a random network. Let us re-draw these, after adjustment for multiple hypotheses. We now see that no edges have a p-value<0.05.
draw_network(b, adjMethod = "bonferroni") draw_heatmap(b, adjMethod = "bonferroni")
If you use a larger database of GO terms, there may be more than one annotation per gene. These can be appended together, separated by a "_" symbol:
#Multiple GO terms are in: randomGO_multiple b=pafway(randomGO_multiple, edgesRandom, unique(randomGO), exact=F) print(b[1:5], b[1:5])
It is usually recommended that you only perform this analysis on GO terms that are of particular interest to you, because otherwise you will lose a lot of statistical power by comparing pairs of GO terms that you don't care about.
#GO terms of interest are in GO_interesting b=pafway(randomGO, edgesRandom, GO_interesting) print(b[1:5], b[1:5])
If you are running pafway (specifically with edge weights) on a really big network, it can take a while. It might be worth decreasing accuracy and increasing speed. One way to do this is to change the threshold at which a value in a pdf is rounded to zero. Another way is to change the step size.
#This will run pafway, with no edge weights, for all the GO terms b=pafway_edge_weights(randomGO, cbind(edgesRandom, random_edge_weights), unique(randomGO), step=0.001, thresholdZero=0.0001) print(b[1:5, 1:5])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.