suppressPackageStartupMessages({ 
    library(ToPASeq)
    library(EnrichmentBrowser)
    library(graphite)
})

Setup

Note: the r Biocpkg("ToPASeq") package currently undergoes a major rework due to the change of the package maintainer. It is recommended to use the topology-based methods implemented in the r Biocpkg("EnrichmentBrowser") or the r Biocpkg("graphite") package instead.

We start by loading the package.

library(ToPASeq)

Preparing the data

For RNA-seq data, we consider transcriptome profiles of four primary human airway smooth muscle cell lines in two conditions: control and treatment with dexamethasone Himes et al., 2014.

We load the r Biocpkg("airway") dataset

library(airway)
data(airway)

For further analysis, we only keep genes that are annotated to an ENSEMBL gene ID.

airSE <- airway[grep("^ENSG", rownames(airway)),]
dim(airSE)
assay(airSE)[1:4,1:4]

Differential expression

The r Biocpkg("EnrichmentBrowser") package incorporates established functionality from the r Biocpkg("limma") package for differential expression analysis. This involves the voom transformation when applied to RNA-seq data. Alternatively, differential expression analysis for RNA-seq data can also be carried out based on the negative binomial distribution with r Biocpkg("edgeR") and r Biocpkg("DESeq2").

This can be performed using the function EnrichmentBrowser::deAna and assumes some standardized variable names:

For more information on experimental design, see the limma user's guide, chapter 9.

For the airway dataset, the GROUP variable indicates whether the cell lines have been treated with dexamethasone (1) or not (0).

airSE$GROUP <- ifelse(airway$dex == "trt", 1, 0)
table(airSE$GROUP)

Paired samples, or in general sample batches/blocks, can be defined via a BLOCK column in the colData slot. For the airway dataset, the sample blocks correspond to the four different cell lines.

airSE$BLOCK <- airway$cell
table(airSE$BLOCK)

For RNA-seq data, the deAna function can be used to carry out differential expression analysis between the two groups either based on functionality from limma (that includes the voom transformation), or alternatively, the frequently used edgeR or DESeq2 package. Here, we use the analysis based on edgeR.

library(EnrichmentBrowser)
airSE <- deAna(airSE, de.method="edgeR")
rowData(airSE, use.names=TRUE)

Pathway analysis

Pathways are typically represented as graphs, where the nodes are genes and edges between the nodes represent interaction between genes.

The r Biocpkg("graphite") package provides pathway collections from major pathway databases such as KEGG, Biocarta, Reactome, and NCI.

Here, we retrieve human KEGG pathways.

library(graphite)
pwys <- pathways(species="hsapiens", database="kegg")
pwys

As the airway dataset uses ENSEMBL gene IDs, but the nodes of the pathways are based on NCBI Entrez Gene IDs,

nodes(pwys[[1]])

we first map the gene IDs in the airway dataset from ENSEMBL to ENTREZ IDs.

airSE <- idMap(airSE, org="hsa", from="ENSEMBL", to="ENTREZID")

Next, we define all genes with adjusted p-value below 0.01 as differentially expressed, and collect their log2 fold change for further analysis.

all <- names(airSE)
de.ind <- rowData(airSE)$ADJ.PVAL < 0.01
de <- rowData(airSE)$FC[de.ind]
names(de) <- all[de.ind]

This results in 2,426 DE genes - out of 11,780 genes in total.

length(all)
length(de)

Pathway Regulation Score (PRS)

The Pathway Regulation Score (PRS) incorporates the pathway topology by weighting the indiviudal gene-level log2 fold changes by the number of downstream DE genes. The weighted absolute fold changes are summed across the pathway and statistical significance is assessed by permutation of genes. Ibrahim et al., 2012

res <- prs(de, all, pwys[1:100], nperm=100)
head(res)

Corresponding gene weights (number of downstream DE genes) can be obtained for a pathway of choice via

ind <- grep("Ras signaling pathway", names(pwys))
weights <- prsWeights(pwys[[ind]], de, all)
weights

Inspecting the genes with maximum number of downstream DE genes

weights[weights == max(weights)]

reveals important upstream regulators including several G protein subunits such as subunit beta 2 (Entrez Gene ID 2783).



lgeistlinger/ToPASeq documentation built on May 25, 2019, 9:32 p.m.