In dieterich-lab/CellPlot: Segmented Bar Chart Plot Methods

Description

CellPlot is a GNU R package with plot methods for the integrated visualisation of functional term enrichment and expression data. It can be used to display commonly used gene ontology (GO) term enrichment analysis from differential expression of genes (DEG) studies.

The development platform of this software is hosted on github. In case of any questions, consider using the issues system.

Usage

Load package and data

After installing the package (see Installation section), load it in a running R session. Example data can be accessed with the data() function.

library(CellPlot)
data("leukemiasGO")

Plot methods

Example to create a subset of the leukemiasGO data set tables:

x <- subset(leukemiasGO$CLL, pvalCutOff <= 0.05 & Significant > 20)
x <- x[order(-x$LogEnrich),]

Cell Plot

The function cell.plot plots a horizontal barchart of strictly positive values in x. For each entry, a vector of additional values needs to be provided in a list. The additional values are plotted as cells subdividing the length of the corresponding bar. A typical usage scenario is plotting the enrichment of Gene Ontology terms, with individual cells reflecting the differential regulation of the constituent genes.

cell.plot(x = setNames(x$LogEnrich, x$Term), 
          cells = x$log2FoldChange, 
          main ="GO enrichment (NoT vs CLL)", 
          x.mar = c(.4, 0), 
          key.n = 7, 
          y.mar = c(.1, 0), 
          cex = 1.6, 
          cell.outer = 3, 
          bar.scale = .7, 
          space = .2)

Sym Plot

The function sym.plot plots a split barchart, showing the proportions of two mutually exclusive sets in relation to a set containing them both. E.g., Gene Ontology terms, showing the proportions of differentially down-regulated and up-regulated annotated genes from a perturbation experiment. The color of the central column elements maps to the value provided in x (e.g. GO term enrichment). Associated genes may be provided as a list of vectors of expression values, same as for cell.plot(), or as separate vectors x.up and x.down, providing the numbers of up- and down-regulated genes in the same order as x.

sym.plot(x = setNames(x$LogEnrich, x$Term), 
         cells = x$log2FoldChange, 
         x.annotated = x$Annotated, 
         main = "GO enrichment (NoT vs CLL)",
         x.mar = c(.47, 0), 
         key.n = 7, 
         cex = 1.6, 
         axis.cex = .8, 
         group.cex = .7)

Arc Plot

An arc.plot represents enrichment bargraphs, showing abundance of positive and negative values (e.g. log2 fold change of gene expression comparisons). Units (e.g. gene names) are clustered and intersections are indicated by an arc.

x$up <- lapply(Map(setNames, x$log2FoldChange, x$GenesSignificant), function (i) { i[i>0] })
x$dwn <- lapply(Map(setNames, x$log2FoldChange, x$GenesSignificant), function (i) { i[i<0] })
arc.plot(x = setNames(x$LogEnrich, x$Term), 
         up.list = x$up, 
         down.list = x$dwn, 
         x.mar = c(.9, .5))

Go Histogram

A bargraph showing positive and negative value abundances per category. Supply a list of data.frames and compare each table, performing a one sided t-test on the absolute logfc.term values.

Data is prepared as follows:

y <- lapply(leukemiasGO, function (x) {
  x$Upregulated <- sapply(x$log2FoldChange, function (z) sum(z>0))
  x$Downregulated <- sapply(x$log2FoldChange, function (z) sum(z<0))
  x
})
yterms <- unique(unlist(lapply(y, function(x){
  x <- subset(x, pvalCutOff <= 0.05)
  x <- x[order(x$LogEnrich),]
  head(x, 9)$GO.ID
})))

An the plot generated:

par(mar = c(0,.5,2.5,8))
go.histogram(y, go.alpha.term = "pvalCutOff", gene.alpha.term = "padj", 
             min.genes = 5, max.genes = 1e10, go.selection = yterms, show.ttest = T,
             main = "GO enrichment\nin leukemia differential gene expression\ncompared to control samples", 
             axis.cex = 1, lab.cex = 1.5, main.cex = 1.5)

Visual parameters

Both functions provide a range of parameters to tweak the visual appearance of the plots.

Font sizes of the various elements can be adjusted using the cex parameters.
Internal margins are adjusted through x.mar and y.mar. Used in particular to accomodate the label text to the left-hand side.
Absolute scaling can be achieved through the bar.scale parameter, which acts as a multiplier for a predetermined average bar height. This is meant to be used in conjunction with a graphics device of constant size, to ensure visual consistency among multiple plots that vary in the number of terms displayed.
Data ranges for all output may be fixed:
- elem.bounds Require the cardinality of a term or significant subset to be in a specific range, e.g. c(10,100) -- between 10 and 100 genes.
- x.bound Upper limit of the value displayed on the x-axis. For sym.plot this is must be a value between 0 and 100. The lower limit is always 0.
- cell.bounds (only cell.plot) Sets the colour scale for cell data to a fixed range. If the plot contains values outside of that range, an indicator is added to the colour legend. This is particularly useful in the event of a few extreme outliers.
- mid.bounds (only sym.plot) Similar to cell.bounds, this sets the colour range of the central column cells.
- sym (only cell.plot), when set to TRUE, makes the scale of cell values symmetrical.
Colour schemes may be controlled through character vectors provided to the following parameters. Internally, the R native function ColorRampPalette() is used.
- cell.col (only cell.plot) Specify three valid colour names for cell data visualisation (lower extreme, mid point, upper extreme).
- mid.col (only sym.plot) Specify two valid colour names to define the range of the central column cell colours.
Colour keys can be adjusted in their resolution (number of boxes) using the key.n parameter.
Gridlines may be hidden via the gridlines parameter.

Workflows

Gene ontology term enrichment in differential gene expression data can be performed using the Bioconductor/topGO package.

Dataset `golubGO`

This section provides code and comments on the workflow to perform differential expression of genes (DEG) analysis and the gene ontology term (GO) enrichment of from the results applied to the golub dataset from the Bioconductor/leuke package.

Dataset `leukemiasGO`

This section provides code and comments on the workflow to perform differential expression of genes analysis and the gene ontology term enrichment of from the results applied to the leukemiasEset dataset from the Bioconductor/leukemiasEset package. GO annotation was performed with the Bioconductor/annotate

# CRAN
library(parallel)
library(dplyr)
library(stringr)
# Bioconductor
library(BiocParallel)
library(DESeq2)
library(topGO)
library(annotate)
library(leukemiasEset)

data(leukemiasEset)

# M <- select(hu6800.db, featureNames(leukemiasEset), c("ENSEMBL","GO"), keytype = "ENSEMBL")
# M <- subset(M, ONTOLOGY == "BP", c("ENSEMBL","GO"))
# M <- M[!duplicated(M),]
# M <- dlply(M, "ENSEMBL", function (x) unique(x$GO))

# M <- as.character(unique(unlist(select(org.Hs.eg.db,featureNames(leukemiasEset),"ENSEMBL",keytype = "ENSEMBL"))))

A <- as(leukemiasEset,"data.frame")
A <- subset(A, LeukemiaType %in% c("NoL","ALL", "AML", "CLL"))
As <- subset(A, select = "LeukemiaType")
As$LeukemiaType <- relevel(factor(as.character(As$LeukemiaType)), "NoL")
A <- subset(A, select = grepl("^ENS", colnames(A)))
A <- sapply(A, as.integer)
A <- t(A)

DEG <- DESeqDataSetFromMatrix(countData = A, colData = As, design = ~ LeukemiaType)
DEG <- DESeq(DEG, fitType = "mean", parallel = T, BPPARAM = MulticoreParam(20))

n <- levels(As$LeukemiaType)[-1]
leukemiasGO <- lapply(setNames(n, n), function (n) {
  x <- results(DEG, c("LeukemiaType", n, "NoL"), "LeukemiaType", alpha = .05)
  x <- as.data.frame(x)
  x$padj[is.na(x$padj)] <- 1
  g <- new("topGOdata", ontology = "BP", description = 'Leukemia', 
           allGenes = setNames(x$padj, rownames(x)), 
           mapping = "org.Hs.eg.db",
           geneSelectionFun = function (allScore) { allScore <= 0.05 },
           annotationFun = annFUN.org, ID = "Ensembl")
  t <- new("elimCount", testStatistic = GOFisherTest, name = "Fisher test") # test definition
  s <- getSigGroups(g, t) # run F-test
  r <- GenTable(g, pvalCutOff = s, topNodes = length(g@graph@nodes)) # return data.frame
  r$pvalCutOff <- as.numeric(str_replace_all(r$pvalCutOff, "[^0-9e\\-\\.]*", ""))
  r$LogEnrich <- log2(r$Significant / r$Expected)
  ga <- genesInTerm(g) # GenesAnnotated | list of genes per go-terms
  ga <- ga[r$GO.ID] # eliminate missing terms
  names(ga) <- NULL
  r$GenesAnnotated <- ga
  xs <- x[,c("padj", "log2FoldChange")] # significant stats subset
  xs <- subset(xs, padj < 0.05)
  r$GenesSignificant <- lapply(r$GenesAnnotated, intersect, rownames(xs)) # extract genes
  ei.rows <- mclapply(r$GenesSignificant, function (y) {
    if (length(y)) as.list(xs[y,,drop=FALSE])
    else as.list(rep(NA_real_, length(xs)))
  }, mc.cores = 10)
  ei <- mclapply(names(xs), function(z) {
    lapply(ei.rows, "[[", z)
  }, mc.cores = 10)
  ei <- structure(ei, names = names(xs), row.names = seq(nrow(r)), class = "data.frame")
  row.names(ei) <- NULL
  r <- data.frame(r, ei, stringsAsFactors = FALSE, check.names = FALSE)
  return(r)
})

# lapply(GO, function(x) {list(all(r$Annotated == sapply(r$GenesAnnotated, length)),
#                              all(r$Significant == sapply(r$GenesSignificant, length)))})

leukemiasGO <- lapply(leukemiasGO, function(x) subset(x, pvalCutOff < 0.1))
save(leukemiasGO, file = "data/leukemiasGO.rdata")

Installation

CellPlot

Install the latest version from the repository on github.com/dieterich-lab/CellPlot. With the devtools package, it is an easy task:

library(devtools)

install_github('dieterich-lab/CellPlot', build_vignettes = TRUE)

Dependencies

To install the packages from Bioconducter, use the following steps:

# load bioconductor functions
source("https://bioconductor.org/biocLite.R")

# install packages
biocLite("BiocParallel")
biocLite("annotate")
biocLite("DESeq2")
biocLite("topGO")
biocLite("leukemiasEset")

Dependencies from CRAN can be installed via:

# install packages
install.packages(c("stringr","dplyr","knitr","rmarkdown","devtools"))

Session info

# session info from build machine
sessionInfo()

dieterich-lab/CellPlot documentation built on May 15, 2019, 8:29 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

dieterich-lab/CellPlot
Segmented Bar Chart Plot Methods

In dieterich-lab/CellPlot: Segmented Bar Chart Plot Methods

Description

Usage

Load package and data

Plot methods

Cell Plot

Sym Plot

Arc Plot

Go Histogram

Visual parameters

Workflows

Dataset `golubGO`

Dataset `leukemiasGO`

Installation

CellPlot

Dependencies

Session info

R Package Documentation

Browse R Packages

We want your feedback!

dieterich-lab/CellPlot Segmented Bar Chart Plot Methods

In dieterich-lab/CellPlot: Segmented Bar Chart Plot Methods

Description

Usage

Load package and data

Plot methods

Cell Plot

Sym Plot

Arc Plot

Go Histogram

Visual parameters

Workflows

Dataset golubGO

Dataset leukemiasGO

Installation

CellPlot

Dependencies

Session info

R Package Documentation

Browse R Packages

We want your feedback!

dieterich-lab/CellPlot
Segmented Bar Chart Plot Methods

Dataset `golubGO`

Dataset `leukemiasGO`