Description

CellPlot is a GNU R package with plot methods for the integrated visualisation of functional term enrichment and expression data. It can be used to display commonly used gene ontology (GO) term enrichment analysis from differential expression of genes (DEG) studies.

The development platform of this software is hosted on github. In case of any questions, consider using the issues system.

Usage

Load package and data

After installing the package (see Installation section), load it in a running R session. Example data can be accessed with the data() function.

library(CellPlot)
data("leukemiasGO")

Plot methods

Example to create a subset of the leukemiasGO data set tables:

x <- subset(leukemiasGO$CLL, pvalCutOff <= 0.05 & Significant > 20)
x <- x[order(-x$LogEnrich),]

Cell Plot

The function cell.plot plots a horizontal barchart of strictly positive values in x. For each entry, a vector of additional values needs to be provided in a list. The additional values are plotted as cells subdividing the length of the corresponding bar. A typical usage scenario is plotting the enrichment of Gene Ontology terms, with individual cells reflecting the differential regulation of the constituent genes.

cell.plot(x = setNames(x$LogEnrich, x$Term), 
          cells = x$log2FoldChange, 
          main ="GO enrichment (NoT vs CLL)", 
          x.mar = c(.4, 0), 
          key.n = 7, 
          y.mar = c(.1, 0), 
          cex = 1.6, 
          cell.outer = 3, 
          bar.scale = .7, 
          space = .2)

Sym Plot

The function sym.plot plots a split barchart, showing the proportions of two mutually exclusive sets in relation to a set containing them both. E.g., Gene Ontology terms, showing the proportions of differentially down-regulated and up-regulated annotated genes from a perturbation experiment. The color of the central column elements maps to the value provided in x (e.g. GO term enrichment). Associated genes may be provided as a list of vectors of expression values, same as for cell.plot(), or as separate vectors x.up and x.down, providing the numbers of up- and down-regulated genes in the same order as x.

sym.plot(x = setNames(x$LogEnrich, x$Term), 
         cells = x$log2FoldChange, 
         x.annotated = x$Annotated, 
         main = "GO enrichment (NoT vs CLL)",
         x.mar = c(.47, 0), 
         key.n = 7, 
         cex = 1.6, 
         axis.cex = .8, 
         group.cex = .7) 

Arc Plot

An arc.plot represents enrichment bargraphs, showing abundance of positive and negative values (e.g. log2 fold change of gene expression comparisons). Units (e.g. gene names) are clustered and intersections are indicated by an arc.

x$up <- lapply(Map(setNames, x$log2FoldChange, x$GenesSignificant), function (i) { i[i>0] })
x$dwn <- lapply(Map(setNames, x$log2FoldChange, x$GenesSignificant), function (i) { i[i<0] })
arc.plot(x = setNames(x$LogEnrich, x$Term), 
         up.list = x$up, 
         down.list = x$dwn, 
         x.mar = c(.9, .5))

Go Histogram

A bargraph showing positive and negative value abundances per category. Supply a list of data.frames and compare each table, performing a one sided t-test on the absolute logfc.term values.

Data is prepared as follows:

y <- lapply(leukemiasGO, function (x) {
  x$Upregulated <- sapply(x$log2FoldChange, function (z) sum(z>0))
  x$Downregulated <- sapply(x$log2FoldChange, function (z) sum(z<0))
  x
})
yterms <- unique(unlist(lapply(y, function(x){
  x <- subset(x, pvalCutOff <= 0.05)
  x <- x[order(x$LogEnrich),]
  head(x, 9)$GO.ID
})))

An the plot generated:

par(mar = c(0,.5,2.5,8))
go.histogram(y, go.alpha.term = "pvalCutOff", gene.alpha.term = "padj", 
             min.genes = 5, max.genes = 1e10, go.selection = yterms, show.ttest = T,
             main = "GO enrichment\nin leukemia differential gene expression\ncompared to control samples", 
             axis.cex = 1, lab.cex = 1.5, main.cex = 1.5)

Visual parameters

Both functions provide a range of parameters to tweak the visual appearance of the plots.

Workflows

Gene ontology term enrichment in differential gene expression data can be performed using the Bioconductor/topGO package.

Dataset golubGO

This section provides code and comments on the workflow to perform differential expression of genes (DEG) analysis and the gene ontology term (GO) enrichment of from the results applied to the golub dataset from the Bioconductor/leuke package.

Dataset leukemiasGO

This section provides code and comments on the workflow to perform differential expression of genes analysis and the gene ontology term enrichment of from the results applied to the leukemiasEset dataset from the Bioconductor/leukemiasEset package. GO annotation was performed with the Bioconductor/annotate

# CRAN
library(parallel)
library(dplyr)
library(stringr)
# Bioconductor
library(BiocParallel)
library(DESeq2)
library(topGO)
library(annotate)
library(leukemiasEset)

data(leukemiasEset)

# M <- select(hu6800.db, featureNames(leukemiasEset), c("ENSEMBL","GO"), keytype = "ENSEMBL")
# M <- subset(M, ONTOLOGY == "BP", c("ENSEMBL","GO"))
# M <- M[!duplicated(M),]
# M <- dlply(M, "ENSEMBL", function (x) unique(x$GO))

# M <- as.character(unique(unlist(select(org.Hs.eg.db,featureNames(leukemiasEset),"ENSEMBL",keytype = "ENSEMBL"))))

A <- as(leukemiasEset,"data.frame")
A <- subset(A, LeukemiaType %in% c("NoL","ALL", "AML", "CLL"))
As <- subset(A, select = "LeukemiaType")
As$LeukemiaType <- relevel(factor(as.character(As$LeukemiaType)), "NoL")
A <- subset(A, select = grepl("^ENS", colnames(A)))
A <- sapply(A, as.integer)
A <- t(A)

DEG <- DESeqDataSetFromMatrix(countData = A, colData = As, design = ~ LeukemiaType)
DEG <- DESeq(DEG, fitType = "mean", parallel = T, BPPARAM = MulticoreParam(20))

n <- levels(As$LeukemiaType)[-1]
leukemiasGO <- lapply(setNames(n, n), function (n) {
  x <- results(DEG, c("LeukemiaType", n, "NoL"), "LeukemiaType", alpha = .05)
  x <- as.data.frame(x)
  x$padj[is.na(x$padj)] <- 1
  g <- new("topGOdata", ontology = "BP", description = 'Leukemia', 
           allGenes = setNames(x$padj, rownames(x)), 
           mapping = "org.Hs.eg.db",
           geneSelectionFun = function (allScore) { allScore <= 0.05 },
           annotationFun = annFUN.org, ID = "Ensembl")
  t <- new("elimCount", testStatistic = GOFisherTest, name = "Fisher test") # test definition
  s <- getSigGroups(g, t) # run F-test
  r <- GenTable(g, pvalCutOff = s, topNodes = length(g@graph@nodes)) # return data.frame
  r$pvalCutOff <- as.numeric(str_replace_all(r$pvalCutOff, "[^0-9e\\-\\.]*", ""))
  r$LogEnrich <- log2(r$Significant / r$Expected)
  ga <- genesInTerm(g) # GenesAnnotated | list of genes per go-terms
  ga <- ga[r$GO.ID] # eliminate missing terms
  names(ga) <- NULL
  r$GenesAnnotated <- ga
  xs <- x[,c("padj", "log2FoldChange")] # significant stats subset
  xs <- subset(xs, padj < 0.05)
  r$GenesSignificant <- lapply(r$GenesAnnotated, intersect, rownames(xs)) # extract genes
  ei.rows <- mclapply(r$GenesSignificant, function (y) {
    if (length(y)) as.list(xs[y,,drop=FALSE])
    else as.list(rep(NA_real_, length(xs)))
  }, mc.cores = 10)
  ei <- mclapply(names(xs), function(z) {
    lapply(ei.rows, "[[", z)
  }, mc.cores = 10)
  ei <- structure(ei, names = names(xs), row.names = seq(nrow(r)), class = "data.frame")
  row.names(ei) <- NULL
  r <- data.frame(r, ei, stringsAsFactors = FALSE, check.names = FALSE)
  return(r)
})

# lapply(GO, function(x) {list(all(r$Annotated == sapply(r$GenesAnnotated, length)),
#                              all(r$Significant == sapply(r$GenesSignificant, length)))})

leukemiasGO <- lapply(leukemiasGO, function(x) subset(x, pvalCutOff < 0.1))
save(leukemiasGO, file = "data/leukemiasGO.rdata")

Installation

CellPlot

Install the latest version from the repository on github.com/dieterich-lab/CellPlot. With the devtools package, it is an easy task:

library(devtools)
install_github('dieterich-lab/CellPlot', build_vignettes = TRUE)

Dependencies

To install the packages from Bioconducter, use the following steps:

# load bioconductor functions
source("https://bioconductor.org/biocLite.R")

# install packages
biocLite("BiocParallel")
biocLite("annotate")
biocLite("DESeq2")
biocLite("topGO")
biocLite("leukemiasEset")

Dependencies from CRAN can be installed via:

# install packages
install.packages(c("stringr","dplyr","knitr","rmarkdown","devtools"))

Session info

# session info from build machine
sessionInfo()


dieterich-lab/CellPlot documentation built on May 15, 2019, 8:29 a.m.