In ccb-hms/BioPlexAnalysis: Applications and downstream analysis of BioPlex PPI data

Setup

library(BioPlex)
library(BioNet)
library(EnrichmentBrowser)
library(graph)

Data retrieval

Get the latest version of the 293T PPI network:

bp.293t <- BioPlex::getBioPlex(cell.line = "293T", version = "3.0")

and turn into a graph object:

bp.gr <- BioPlex::bioplex2graph(bp.293t)

Get RNA-seq data for 293T cells from GEO: GSE122425

gse122425 <- BioPlex::getGSE122425()

Differential expression analysis

Differential expression analysis between NSUN2 knockout and wild type:

gse122425$GROUP <- rep(c(0, 1), each = 3)
gse122425 <- EnrichmentBrowser::deAna(gse122425, de.method = "DESeq2")

Inspect results:

rowData(gse122425)
not.na <- !is.na(rowData(gse122425)$PVAL)
gse122425 <- gse122425[not.na,]
EnrichmentBrowser::volcano(rowData(gse122425)$FC, rowData(gse122425)$ADJ.PVAL)

Change rownames to gene symbols for intersection with bioplex graph:

rowData(gse122425)$ENSEMBL <- rownames(gse122425)
rownames(gse122425) <- rowData(gse122425)$SYMBOL

Intersect bioplex graph and expression dataset:

syms <- graph::nodeData(bp.gr, graph::nodes(bp.gr), "SYMBOL")
syms <- unlist(syms)
isect <- intersect(syms, rownames(gse122425)) 
ind <- match(isect, syms) 
bp.sgr <- graph::subGraph(names(syms)[ind], bp.gr)
bp.sgr 
bp.sgr <- BioNet::rmSelfLoops(bp.sgr)
bp.sgr

Get the DE p-values for each node:

syms <- syms[ind]
ind <- match(isect, rownames(gse122425))
pvals <- rowData(gse122425)[ind, "PVAL"]
names(pvals) <- names(syms)
head(pvals)

To score each node of the network we fit a Beta-uniform mixture model (BUM) to the p-value distribution and subsequently use the parameters of the model for the scoring function:

fb <- BioNet::fitBumModel(pvals)
scores <- BioNet::scoreNodes(bp.sgr, fb, fdr = 0.05)
head(scores)
summary(scores)

Maximum scoring subnetwork analysis

First reduction step: unsupervised detection of maximum scoring subnetwork

Here we use a fast heuristic approach to calculate an approximation to theoptimal scoring subnetwork.
An optimal solution can be calculated using the heinz algorithm requiring a commercial CPLEX license.

module <- BioNet::runFastHeinz(bp.sgr, scores)

module <- BioPlexAnalysis:::.getResourceFromCache("heinz.module", update.value = NA)
if(is.null(module))
{
    module <- BioNet::runFastHeinz(bp.sgr, scores)
    BioPlexAnalysis:::.cacheResource(module, "heinz.module")
}

module

For some reason BioNet::runFastHeinz drops the edgeData of the input graph, so we need to re-annotate those to the module. There doesn't seem to be a convenient way to bulk transfer edgeData from one graph to another, but rather only for one edge and one attribute at a time.

With a little abuse, we help ourselves via accessing slots directly.

en <- names(edgeData(module))
edgeDataDefaults(module) <- edgeDataDefaults(bp.sgr)
module@edgeData@data[en] <- edgeData(bp.sgr)[en]

Second reduction step: supervised enrichment analysis of maximum scoring subnetwork

kegg.gs <- EnrichmentBrowser::getGenesets(org = "hsa", db = "kegg", gene.id.type = "UNIPROT")
head(kegg.gs)
go.gs <- EnrichmentBrowser::getGenesets(org = "hsa", db = "go", gene.id.type = "UNIPROT")
head(go.gs)

Given the maximum scoring subnetwork and gene sets of interest, we can now apply different set-based enrichment analysis (sbea) methods or network-based enrichment analysis (nbea) methods to identify biological themes within a module.

EnrichmentBrowser::sbeaMethods()
EnrichmentBrowser::nbeaMethods()

Those are conveniently wrapped in the EnrichmentBrowser package and provide for a prioritization of gene sets of interest within a maximum scoring subnetwork.

To illustrate the concept, we use information on gene set enrichment from the paper.

NSUN2 is a nucleolus RNA m5C methyltransferase, regulator of cell proliferation and cell differentiation, dysfunction associated with intellectual diseases and cancer; reported to be a downstream target gene of MYC oncogene, a master regulator of cell proliferation.

Among the pathways showing enrichment in the paper is Ribosome, so as a showcase we extract nodes annotated to the gene set "Ribosome" from the maximum scoring subnetwork.

ind <- grep("Ribosome$", names(kegg.gs))
ribo <- kegg.gs[[ind]]
ribo <- intersect(ribo, graph::nodes(module))
ribo.gr <- graph::subGraph(ribo, module)
ribo.gr

Such a gene set induced sub-graph can then be further explored in our BioPlex node data and edge data shiny viewer.

For that we might be interested in mapping additional data onto the nodes and the edges of the network.

For example, we might want to add the Bioplex proteome data onto the graph for inspection via the shiny graph viewer.

bp.prot <- BioPlex::getBioplexProteome()
bp.prot
ribo.gr <- BioPlex::mapSummarizedExperimentOntoGraph(ribo.gr, bp.prot,
    col.names = grep("HEK", colnames(bp.prot), value = TRUE), 
    rowdata.cols = c("nr.peptides", "log2ratio", "adj.pvalue"),
    prefix = "bpprot_")
head(graph::nodeData(ribo.gr), n = 2)

SessionInfo

sessionInfo()

ccb-hms/BioPlexAnalysis documentation built on Jan. 29, 2023, 6:55 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ccb-hms/BioPlexAnalysis
Applications and downstream analysis of BioPlex PPI data

In ccb-hms/BioPlexAnalysis: Applications and downstream analysis of BioPlex PPI data

Setup

Data retrieval

Differential expression analysis

Maximum scoring subnetwork analysis

First reduction step: unsupervised detection of maximum scoring subnetwork

Second reduction step: supervised enrichment analysis of maximum scoring subnetwork

SessionInfo

R Package Documentation

Browse R Packages

We want your feedback!

ccb-hms/BioPlexAnalysis Applications and downstream analysis of BioPlex PPI data

In ccb-hms/BioPlexAnalysis: Applications and downstream analysis of BioPlex PPI data

Setup

Data retrieval

Differential expression analysis

Maximum scoring subnetwork analysis

First reduction step: unsupervised detection of maximum scoring subnetwork

Second reduction step: supervised enrichment analysis of maximum scoring subnetwork

SessionInfo

R Package Documentation

Browse R Packages

We want your feedback!

ccb-hms/BioPlexAnalysis
Applications and downstream analysis of BioPlex PPI data