Setup

library(BioPlex)
library(graph)
library(reticulate)

Data retrieval

Get the latest version of the 293T PPI network ...

bp.293t.3 <- BioPlex::getBioPlex(cell.line = "293T", version = "3.0")
head(bp.293t.3)

... and turn into a graph:

gr.293t.3 <- BioPlex::bioplex2graph(bp.293t.3)

Obtain the complete set of human protein complexes from CORUM ...

corum.df <- BioPlex::getCorum(set = "core", organism = "Human")

... and turn into a list:

corum.list <- BioPlex::corum2list(corum.df)
head(corum.list)

Identify complexes with at least three subunits:

hasSubunits <- function(s, gr)
{
    s <- intersect(s, graph::nodes(gr))
    length(s) > 2
}
has.subunits <- vapply(corum.list, hasSubunits, logical(1), gr = gr.293t.3)
table(has.subunits)

Identify complexes with at least one subunit targeted as bait:

hasBait <- function(s, df) any(s %in% df$UniprotA)
has.bait <- vapply(corum.list, hasBait, logical(1), df = bp.293t.3)
table(has.bait)

Identify complexes with at least one PPI between subunits:

hasEdge <- function(s, gr)
{
    s <- intersect(s, graph::nodes(gr))
    graph::numEdges(graph::subGraph(s, gr)) > 0
}
has.edge <- vapply(corum.list, hasEdge, logical(1), gr = gr.293t.3)
table(has.edge)

We then subset the CORUM complexes to those having (i) at least three subunits, (ii) at least one subunit targeted as bait, and (iii) at least one PPI between subunits.

ind <- has.subunits & has.bait & has.edge
table(ind)
sub.corum.list <- corum.list[ind]

Overlap analysis with protein complexes (random sampling)

The bioplexpy package implements a function for testing overlaps of PPIs with a complex of interest based on random sampling.

The function samples random subnetworks from the given PPI network, matching the number of subunits and the bait:prey ratio of the complex being tested. We then count the number of interactions in each replication, and compare against the observed number of interactions overlapping with the complex.

Here, we set up a python environment via reticulate that contains the bioplexpy package in order to invoke the function from within R, facilitating direct exchange of data and results between the BioPlex R and BioPlex Python packages:

reticulate::virtualenv_create("r-reticulate")
reticulate::virtualenv_install("r-reticulate", "bioplexpy")
reticulate::use_virtualenv("r-reticulate")
bp <- reticulate::import("bioplexpy")

Here, we use a functionality from the BioPlex Python package to turn the BioPlex data into a NetworkX graph:

gr <- bp$bioplex2graph(bp.293t.3)
gr

and also to obtain the subset of CORUM complexes satisfying the above criteria.

corum <- bp$getCorum(complex_set = "core", organism = "Human")
sub.corum <- corum[ind,] 

Now we carry out the overlap resampling test as implemented in the bioplexpy package for all complexes passing the filter criteria:

ps <- vapply(sub.corum$ComplexID,
             function(i) bp$permutation_test_for_CORUM_complex(gr, sub.corum, i, 100),
             numeric(1))
head(ps)
summary(ps)

We can now take a peak at the p-value distribution from the overlap permutation test, observing a strong concentration near zero, confirming that, as expected, many complexes are enriched for PPIs - when compared to random subsets of the overall PPIs network matching the complexes in size and composition.

hist(ps, breaks = 20, xlab = "p-value", ylab = "frequency", col = "#00AFBB")

Overlap analysis with protein complexes (network randomization)

The resampling test implemented in the BioPlex Python package addresses the question: given a specific network topology, how likely is it that n nodes are connected by at least m edges? In the spirit of a competitive enrichment test, we therefore assess whether the connectivity within the protein set of interest exceeds the connectivity outside of the protein set of interest (= overall / background connectivity).

It is also possible to test each complex for enrichment of PPIs based on network randomization as implemented in the BiRewire package. This addresses the question: in networks of different topology but the same node degree distribution, how likely is it that n nodes are connected by at least m edges? The network randomization test thus randomizes the overall network while keeping the protein set of interest fixed (in contrast to the resampling test, which randomizes the protein set of interest while keeping the overall network fixed).

This approach is incorporated in the testConnectivity function, which we apply here to test all CORUM complexes that were passing the filter criteria above. We therefore randomize the PPI network a defined number of times (here: 100 times), and calculate for each complex how often the number of edges in a randomized configuration exceeded the number of edges observed for the true PPI network.

ps2 <- BioPlexAnalysis::testConnectivity(sub.corum.list, 
                                         gr.293t.3,
                                         nr.reps = 10)
head(ps2)
summary(ps2)

As before we can take a peak at the p-value distribution for the network randomization test, which displays as for the resampling test a strong enrichment of PPIs across almost all CORUM complexes.

hist(ps2, breaks = 20, xlab = "p-value", ylab = "frequency", col = "#00AFBB")


ccb-hms/BioPlexAnalysis documentation built on Jan. 29, 2023, 6:55 p.m.