CellPlot is a GNU R package with plot methods for the integrated visualisation of functional term enrichment and expression data. It can be used to display commonly used gene ontology (GO) term enrichment analysis from differential expression of genes (DEG) studies.
The development platform of this software is hosted on github. In case of any questions, consider using the issues system.
After installing the package (see Installation section), load it in a running
R session. Example data can be accessed with the data()
function.
library(CellPlot) data("leukemiasGO")
Example to create a subset of the leukemiasGO
data set tables:
x <- subset(leukemiasGO$CLL, pvalCutOff <= 0.05 & Significant > 20) x <- x[order(-x$LogEnrich),]
The function cell.plot
plots a horizontal barchart of strictly positive values in x.
For each entry, a vector of additional values needs to be provided in a list.
The additional values are plotted as cells subdividing the length of the corresponding bar.
A typical usage scenario is plotting the enrichment of Gene Ontology terms,
with individual cells reflecting the differential regulation of the constituent genes.
cell.plot(x = setNames(x$LogEnrich, x$Term), cells = x$log2FoldChange, main ="GO enrichment (NoT vs CLL)", x.mar = c(.4, 0), key.n = 7, y.mar = c(.1, 0), cex = 1.6, cell.outer = 3, bar.scale = .7, space = .2)
The function sym.plot
plots a split barchart, showing the proportions of
two mutually exclusive sets in relation to a set containing them both.
E.g., Gene Ontology terms, showing the proportions of differentially down-regulated
and up-regulated annotated genes from a perturbation experiment.
The color of the central column elements maps to the value provided in x (e.g. GO term enrichment).
Associated genes may be provided as a list of vectors of expression values,
same as for cell.plot(), or as separate vectors x.up and x.down,
providing the numbers of up- and down-regulated genes in the same order as x.
sym.plot(x = setNames(x$LogEnrich, x$Term), cells = x$log2FoldChange, x.annotated = x$Annotated, main = "GO enrichment (NoT vs CLL)", x.mar = c(.47, 0), key.n = 7, cex = 1.6, axis.cex = .8, group.cex = .7)
An arc.plot
represents enrichment bargraphs, showing abundance of positive and negative
values (e.g. log2 fold change of gene expression comparisons). Units (e.g. gene names) are
clustered and intersections are indicated by an arc.
x$up <- lapply(Map(setNames, x$log2FoldChange, x$GenesSignificant), function (i) { i[i>0] }) x$dwn <- lapply(Map(setNames, x$log2FoldChange, x$GenesSignificant), function (i) { i[i<0] }) arc.plot(x = setNames(x$LogEnrich, x$Term), up.list = x$up, down.list = x$dwn, x.mar = c(.9, .5))
A bargraph showing positive and negative value abundances per category.
Supply a list of data.frames and compare each table, performing a one
sided t-test on the absolute logfc.term
values.
Data is prepared as follows:
y <- lapply(leukemiasGO, function (x) { x$Upregulated <- sapply(x$log2FoldChange, function (z) sum(z>0)) x$Downregulated <- sapply(x$log2FoldChange, function (z) sum(z<0)) x }) yterms <- unique(unlist(lapply(y, function(x){ x <- subset(x, pvalCutOff <= 0.05) x <- x[order(x$LogEnrich),] head(x, 9)$GO.ID })))
An the plot generated:
par(mar = c(0,.5,2.5,8)) go.histogram(y, go.alpha.term = "pvalCutOff", gene.alpha.term = "padj", min.genes = 5, max.genes = 1e10, go.selection = yterms, show.ttest = T, main = "GO enrichment\nin leukemia differential gene expression\ncompared to control samples", axis.cex = 1, lab.cex = 1.5, main.cex = 1.5)
Both functions provide a range of parameters to tweak the visual appearance of the plots.
cex
parameters.x.mar
and y.mar
. Used in particular to accomodate the label text to the left-hand side.bar.scale
parameter, which acts as a multiplier for a predetermined average bar height. This is meant to be used in conjunction with a graphics device of constant size, to ensure visual consistency among multiple plots that vary in the number of terms displayed.elem.bounds
Require the cardinality of a term or significant subset to be in a specific range, e.g. c(10,100)
-- between 10 and 100 genes.x.bound
Upper limit of the value displayed on the x-axis. For sym.plot
this is must be a value between 0 and 100. The lower limit is always 0.cell.bounds
(only cell.plot
) Sets the colour scale for cell data to a fixed range. If the plot contains values outside of that range, an indicator is added to the colour legend. This is particularly useful in the event of a few extreme outliers.mid.bounds
(only sym.plot
) Similar to cell.bounds
, this sets the colour range of the central column cells.sym
(only cell.plot
), when set to TRUE
, makes the scale of cell values symmetrical.ColorRampPalette()
is used.cell.col
(only cell.plot
) Specify three valid colour names for cell data visualisation (lower extreme, mid point, upper extreme).mid.col
(only sym.plot
) Specify two valid colour names to define the range of the central column cell colours. key.n
parameter.gridlines
parameter.Gene ontology term enrichment in differential gene expression data can be performed using the Bioconductor/topGO package.
golubGO
This section provides code and comments on the workflow to perform differential
expression of genes (DEG) analysis and the gene ontology term (GO) enrichment
of from the results applied to the golub
dataset from the
Bioconductor/leuke
package.
leukemiasGO
This section provides code and comments on the workflow to perform differential
expression of genes analysis and the gene ontology term enrichment
of from the results applied to the leukemiasEset
dataset from the
Bioconductor/leukemiasEset package.
GO annotation was performed with the
Bioconductor/annotate
# CRAN library(parallel) library(dplyr) library(stringr) # Bioconductor library(BiocParallel) library(DESeq2) library(topGO) library(annotate) library(leukemiasEset) data(leukemiasEset) # M <- select(hu6800.db, featureNames(leukemiasEset), c("ENSEMBL","GO"), keytype = "ENSEMBL") # M <- subset(M, ONTOLOGY == "BP", c("ENSEMBL","GO")) # M <- M[!duplicated(M),] # M <- dlply(M, "ENSEMBL", function (x) unique(x$GO)) # M <- as.character(unique(unlist(select(org.Hs.eg.db,featureNames(leukemiasEset),"ENSEMBL",keytype = "ENSEMBL")))) A <- as(leukemiasEset,"data.frame") A <- subset(A, LeukemiaType %in% c("NoL","ALL", "AML", "CLL")) As <- subset(A, select = "LeukemiaType") As$LeukemiaType <- relevel(factor(as.character(As$LeukemiaType)), "NoL") A <- subset(A, select = grepl("^ENS", colnames(A))) A <- sapply(A, as.integer) A <- t(A) DEG <- DESeqDataSetFromMatrix(countData = A, colData = As, design = ~ LeukemiaType) DEG <- DESeq(DEG, fitType = "mean", parallel = T, BPPARAM = MulticoreParam(20)) n <- levels(As$LeukemiaType)[-1] leukemiasGO <- lapply(setNames(n, n), function (n) { x <- results(DEG, c("LeukemiaType", n, "NoL"), "LeukemiaType", alpha = .05) x <- as.data.frame(x) x$padj[is.na(x$padj)] <- 1 g <- new("topGOdata", ontology = "BP", description = 'Leukemia', allGenes = setNames(x$padj, rownames(x)), mapping = "org.Hs.eg.db", geneSelectionFun = function (allScore) { allScore <= 0.05 }, annotationFun = annFUN.org, ID = "Ensembl") t <- new("elimCount", testStatistic = GOFisherTest, name = "Fisher test") # test definition s <- getSigGroups(g, t) # run F-test r <- GenTable(g, pvalCutOff = s, topNodes = length(g@graph@nodes)) # return data.frame r$pvalCutOff <- as.numeric(str_replace_all(r$pvalCutOff, "[^0-9e\\-\\.]*", "")) r$LogEnrich <- log2(r$Significant / r$Expected) ga <- genesInTerm(g) # GenesAnnotated | list of genes per go-terms ga <- ga[r$GO.ID] # eliminate missing terms names(ga) <- NULL r$GenesAnnotated <- ga xs <- x[,c("padj", "log2FoldChange")] # significant stats subset xs <- subset(xs, padj < 0.05) r$GenesSignificant <- lapply(r$GenesAnnotated, intersect, rownames(xs)) # extract genes ei.rows <- mclapply(r$GenesSignificant, function (y) { if (length(y)) as.list(xs[y,,drop=FALSE]) else as.list(rep(NA_real_, length(xs))) }, mc.cores = 10) ei <- mclapply(names(xs), function(z) { lapply(ei.rows, "[[", z) }, mc.cores = 10) ei <- structure(ei, names = names(xs), row.names = seq(nrow(r)), class = "data.frame") row.names(ei) <- NULL r <- data.frame(r, ei, stringsAsFactors = FALSE, check.names = FALSE) return(r) }) # lapply(GO, function(x) {list(all(r$Annotated == sapply(r$GenesAnnotated, length)), # all(r$Significant == sapply(r$GenesSignificant, length)))}) leukemiasGO <- lapply(leukemiasGO, function(x) subset(x, pvalCutOff < 0.1)) save(leukemiasGO, file = "data/leukemiasGO.rdata")
Install the latest version from the repository on
github.com/dieterich-lab/CellPlot.
With the devtools
package, it is an easy task:
library(devtools)
install_github('dieterich-lab/CellPlot', build_vignettes = TRUE)
To install the packages from Bioconducter, use the following steps:
# load bioconductor functions source("https://bioconductor.org/biocLite.R") # install packages biocLite("BiocParallel") biocLite("annotate") biocLite("DESeq2") biocLite("topGO") biocLite("leukemiasEset")
Dependencies from CRAN can be installed via:
# install packages install.packages(c("stringr","dplyr","knitr","rmarkdown","devtools"))
# session info from build machine sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.