Using dittoSeq to visualize (sc)RNAseq data

knitr::opts_chunk$set(error=FALSE, message=FALSE, warning=FALSE,
    dev="jpeg", dpi = 72, fig.width = 4.5, fig.height = 3.5)


dittoSeq is a tool built to enable analysis and visualization of single-cell and bulk RNA-sequencing data by novice, experienced, and color-blind coders. Thus, it provides many useful visualizations, which all utilize red-green color-blindness optimized colors by default, and which allow sufficient customization, via discrete inputs, for out-of-the-box creation of publication-ready figures.

For single-cell data, dittoSeq works directly with data pre-processed in other popular packages (Seurat, scater, scran, ...). For bulk RNAseq data, dittoSeq's import functions will convert bulk RNAseq data of various different structures into a set structure that dittoSeq helper and visualization functions can work with. So ultimately, dittoSeq includes universal plotting and helper functions for working with (sc)RNAseq data processed and stored in these formats:



For bulk data, or if your data is currently not analyzed, or simply not in one of these structures, you can still pull it in to the SingleCellExperiment structure that dittoSeq works with using the importDittoBulk function.

Color-blindness friendliness:

The default colors of this package are red-green color-blindness friendly. To make it so, I used the suggested colors from [@wong_points_2011] and adapted them slightly by appending darker and lighter versions to create a 24 color vector. All plotting functions use these colors, stored in dittoColors(), by default.



Code used here for dataset processing and normalization should not be seen as a suggestion of the proper methods for performing such steps. dittoSeq is a visualization tool, and my focus while developing this vignette has been simply creating values required for providing visualization examples.


dittoSeq is available through Bioconductor.

# Install BiocManager if needed
if (!requireNamespace("BiocManager", quietly = TRUE))

# Install dittoSeq

Some setup for this vignette that can be ignored

Here, we will need to do some prep as the dataset we will use from @baron_single-cell_2016 is not normalized nor dimensionality reduced.

# Download data
sce <- BaronPancreasData()
# Trim to only 5 of the cell types for simplicity of vignette
sce <- sce[,meta("label",sce) %in% c(
    "acinar", "beta", "gamma", "delta", "ductal")]

Now that we have a single-cell dataset loaded, we are ready to go. All functions work for either Seurat or SCE encapsulated single-cell data.

But to make full use of dittoSeq, we should really have this data log-normalized, and we should run dimensionality reduction and clustering.

# Make Seurat and grab metadata
seurat <- CreateSeuratObject(counts(sce))
seurat <- AddMetaData(seurat, sce$label, = "celltype")
seurat <- AddMetaData(seurat, sce$donor, = "Sample")
seurat <- AddMetaData(seurat,
                      PercentageFeatureSet(seurat, pattern = "^MT"),
             = "")
# Basic Seurat workflow (possibly outdated, but fine for this vignette)
seurat <- NormalizeData(seurat, verbose = FALSE)
seurat <- FindVariableFeatures(object = seurat, verbose = FALSE)
seurat <- ScaleData(object = seurat, verbose = FALSE)
seurat <- RunPCA(object = seurat, verbose = FALSE)
seurat <- RunTSNE(object = seurat)
seurat <- FindNeighbors(object = seurat, verbose = FALSE)
seurat <- FindClusters(object = seurat, verbose = FALSE)
# Grab PCA, TSNE, clustering, log-norm data, and metadata to sce

# sce <- as.SingleCellExperiment(seurat)
# At the time this part of the vignette was made, the above function gave
# warnings. So... manual method
sce <- addDimReduction(
  sce, embeddings = Embeddings(seurat, reduction = "pca"), name = "PCA")
sce <- addDimReduction(
  sce, embeddings = Embeddings(seurat, reduction = "tsne"), name = "TSNE")
sce$idents <- seurat$seurat_clusters
assay(sce, "logcounts") <- GetAssayData(seurat)
sce$ <- seurat$
sce$celltype <- seurat$celltype
sce$Sample <- seurat$Sample

Now that we have a single-cell dataset loaded and analyzed in Seurat, let's convert it to an SCE for examples purposes.

All functions will work the same for either the Seurat or SCE version.

Getting started

Single-cell RNAseq data

dittoSeq works natively with Seurat and SingleCellExperiment objects. Nothing special is needed. Just load in your data if it isn't already loaded, then go!

dittoDimPlot(seurat, "Sample")
dittoPlot(seurat, "ENO1", = "celltype")
dittoBarPlot(sce, "celltype", = "Sample")

Bulk RNAseq data

dittoSeq works natively with bulk RNAseq data stored as a SummarizedExperiment object. For bulk data stored in other forms, namely as a DGEList or as raw matrices, one can use the importDittoBulk() function to convert it into the SingleCellExperiment structure.

Some brief details on this structure: The SingleCellEExperiment class is very similar to the SummarizedExperiment class, just with room added for storing pre-calculated dimensionality reductions.

# First, let's make some mock expression and conditions data
exp <- matrix(rpois(20000, 5), ncol=20)
colnames(exp) <- paste0("sample", seq_len(ncol(exp)))
rownames(exp) <- paste0("gene", seq_len(nrow(exp)))
logexp <- logexp <- log2(exp + 1)

conditions <- factor(rep(1:4, 5))
sex <- c(rep("M", 9), rep("F", 11))

Standard bulk data import workflow, importDittoBulk()

Importing bulk data can be accomplished with just the importDittoBulk() function. The function converts various common storage structures for bulk data into the SingleCellExperiment structure.

# Import
myRNA <- importDittoBulk(
    # x can be a DGEList, a DESeqDataSet, a SummarizedExperiment,
    #   or a list of data matrices
    x = list(counts = exp,
             logcounts = logexp),
    # Optional inputs:
    #   For adding metadata
    metadata = data.frame(conditions = conditions,
                          sex = sex),
    #   For adding dimensionality reductions
    reductions = list(pca = matrix(rnorm(20000), nrow=20)))

Metadata and dimensionality reductions can be added either directly within the importDittoBulk() function via the metadata and reductions inputs, respectively, or separately afterwards:

# Add metadata (metadata can alternatively be added in this way)
myRNA$conditions <- conditions
myRNA$sex <- sex

# Add dimensionality reductions (can alternatively be added this way)
# (We aren't actually calculating PCA here.)
myRNA <- addDimReduction(
    object = myRNA,
    embeddings = matrix(rnorm(20000), nrow=20),
    name = "pca",
    key = "PC")

Making plots for bulk data then operates the exact same way as for single-cell.

dittoDimPlot(myRNA, "sex", size = 3, do.ellipse = TRUE)
dittoBarPlot(myRNA, "sex", = "conditions")
dittoBoxPlot(myRNA, "gene1", = "sex")
dittoHeatmap(myRNA, getGenes(myRNA)[1:10], = c("conditions", "sex"))

Additional details on bulk data import:

By default, sample-associated data from original objects are retained. But metadata provided to the metadata input will replace any similarly named slots from the original object. The combine_metadata input can additionally be used to turn retention of previous metadata slots off.

DGEList note: The import function attempts to pull in all information stored in common DGEList slots (\$counts, \$samples, \$genes, \$AveLogCPM, \$common.dispersion, \$trended.dispersion, \$tagwise.dispersion, and \$offset), but any other slots are ignored.

When providing x a list of a single or multiple matrices, it is recommended that matrices containing raw feature counts data be named counts, log-normalized counts data be named logcounts, and otherwise normalized data, be named normcounts. Then you can give the assay input of dittoSeq functions "counts" to point towards the raw data for example. This is not a requirement, but the default assay used in dittoSeq functions will be one of: 1) "logcounts" if it exists, 2) "normcounts" if it exists, 3) "counts" if it exists, or 4) whatever the first assay is in the object.

The SCE object created by importDittoBulk() will contain an internal metadata slot which tells dittoSeq that the object holds bulk data. Knowledge of whether a dataset is single-cell versus bulk is used to aadjust parameter defaults for in few functions; "samples" vs "cells" in the y-axis label of dittoBarPlot(), and whether cells (no) versus samples (yes) should be clustered by default for dittoHeatmap().

Helper Functions

dittoSeq's helper functions make it easy to determine the metadata, gene, and dimensionality reduction options for plotting.


# Retrieve all metadata slot names
# Query for the presence of a metadata slot
isMeta("nCount_RNA", seurat)
# Retrieve metadata values:
meta("celltype", seurat)[1:10]
# Retrieve unique values of a metadata
metaLevels("celltype", seurat)


# Retrieve all gene names
# Query for the presence of a gene(s)
isGene("CD3E", seurat)
isGene(c("CD3E","ENO1","INS","non-gene"), seurat, return.values = TRUE)
# Retrieve gene expression values:
gene("ENO1", seurat)[1:10]


# Retrieve all dimensionality reductions

These are what can be provided to reduction.use for dittoDimPlot().

Characteristic: Bulk versus single-cell

Because dittoSeq utilizes the SingleCellExperiment structure to handle some bulk RNAseq data, there is a getter and setter for the internal metadata which tells dittoSeq functions which resolution of data a target SCE holds.

# Getter

# Setter
mock_bulk <- setBulk(sce) # to bulk
mock_sc <- setBulk(myRNA, set = FALSE) # to single-cell


There are many different types of dittoSeq visualizations. Each has intuitive defaults which allow creation of immediately usable plots. Each also has many additional tweaks available through discrete inputs that can help ensure you can create precisely-tuned, deliberately-labeled, publication-quality plots out-of-the-box.

dittoDimPlot & dittoScatterPlot

These show cells/samples data overlaid on a scatter plot, with the axes of dittoScatterPlot() being gene expression or metadata data and with the axes of dittoDimPlot() being dimensionality reductions like tsne, pca, umap or similar.

dittoDimPlot(seurat, "celltype", reduction.use = "pca")
dittoDimPlot(sce, "ENO1")
    object = sce,
    x.var = "PPY", y.var = "INS",
    color.var = "celltype")
    object = seurat,
    x.var = "nCount_RNA", y.var = "nFeature_RNA",
    color.var = "")

Additional features

Various additional features can be overlaid on top of these plots. Adding each is controlled by an input that starts with add. or do. such as:

Additional inputs that apply to and adjust these features will then start with the XXXX part that comes after add.XXXX or do.XXXX, as exemplified below. (Tab-completion friendly!)

A few examples:

dittoDimPlot(seurat, "ident",

             do.label = TRUE,
             labels.repel = FALSE,

             add.trajectory.lineages = list(
             trajectory.cluster.meta = "ident")

dittoDimHex & dittoScatterHex

Similar to the "Plot" versions, these show cells/samples data overlaid on a scatter plot, with the axes of dittoScatterHex() being gene expression or metadata or some other data, and with the axes of dittoDimHex() being dimensionality reductions like tsne, pca, umap or similar.

The plot area is then broken into hexagonal bins and data is presented as summaries of cells/samples within each of those bins.

The minimal functions will summarize density of cells/samples only using color.

    x.var = "PPY", y.var = "INS")

An additional feature can be provided to have that data be summarized in addition to density. Density will then be represented with opacity, while color is used for the additional feature. The color.method input then controls how data within the bins are represented.

NOTE: It is important to note that as soon as differing opacity is added, the color-blindness friendliness of dittoSeq's default colors is no longer guaranteed.

dittoDimHex(seurat, "INS")
    object = seurat,
    x.var = "PPY", y.var = "INS",
    color.var = "celltype",
    colors = c(1:4,7), max.density = 15)

Summary function control

color.method controls how data within the bins are represented in colors. It is provided a string, but how that string is utilized depends on the type of target data.

For discrete data, you can provide either "max" (the default) to display the predominant grouping of the bins, or "max.prop" to display the proportion of cells in the bins that belong to the maximal grouping.

For continuous data, any string signifying a function [that summarizes a numeric vector input into with a single numeric value] can be provided. The default is "median", but other useful options are "sum", "mean", "sd", or "mad".

Additional features

Similar to dittoDimPlot and dittoScatterPlot, various additional layers are built in and their addition is controlled by inputs that starts with add. or do. such as:

Additional inputs that apply to and adjust these features will then start with the XXXX part that comes after add.XXXX or do.XXXX, as exemplified below. (Tab-completion friendly!)

dittoPlot (and dittoRidgePlot + dittoBoxPlot wrappers)

These display continuous cells/samples' data on a y-axis (or x-axis for ridgeplots) grouped on the x-axis by sample, age, condition, or any discrete grouping metadata. Data can be represented with violin plots, box plots, individual points for each cell/sample, and/or ridge plots. The plots input controls which data representations are used. The input controls how the data are grouped in the x-axis. And the input controls the colors that fill in violin, box, and ridge plots.

dittoPlot() is the main function, but dittoRidgePlot() and dittoBoxPlot() are wrappers which essentially just adjust the default for the plots input from c("jitter", "vlnplot") to c("ridgeplot") or c("boxplot","jitter"), respectively.

dittoPlot(seurat, "ENO1", = "celltype",
    plots = c("vlnplot", "jitter"))
dittoRidgePlot(sce, "ENO1", = "celltype")
dittoBoxPlot(seurat, "ENO1", = "celltype")

Adjustments to data representations

Tweaks to the individual data representation types can be made with discrete inputs, all of which start with the representation types' name. For example...

dittoPlot(seurat, "ENO1", = "celltype",
    plots = c("jitter", "vlnplot", "boxplot"), # <- order matters

    # change the color and size of jitter points
    jitter.color = "blue", jitter.size = 0.5,

    # change the outline color and width, and remove the fill of boxplots
    boxplot.color = "white", boxplot.width = 0.1,
    boxplot.fill = FALSE,

    # change how the violin plot widths are normalized across groups
    vlnplot.scaling = "count"


This function displays discrete cells/samples' data on a y-axis, grouped on the x-axis by sample, age, condition, or any discrete grouping metadata. Data can be represented as percentages or counts, and this is controlled by the scale input.

dittoBarPlot(seurat, "celltype", = "Sample")
dittoBarPlot(seurat, "ident", = "Sample",
    scale = "count")


This function is essentially a wrapper for generating heatmaps with pheatmap, but with the same automatic, user-friendly, data extraction, (subsetting,) and metadata integration common to other dittoSeq functions.

For large, many cell, single-cell datasets, it can be necessary to turn off clustering by cells in generating the heatmap because the process is very memory intensive. As an alternative, dittoHeatmap offers the ability to order columns in functional ways using the input. This input will default to the first annotation provided to for single cell datasets, but can also be controlled separately.

# Pick Genes
genes <- c("SST", "REG1A", "PPY", "INS", "CELA3A", "PRSS2", "CTRB1",
    "CPA1", "CTRB2" , "REG3A", "REG1B", "PRSS1", "GCG", "CPB1",
    "SPINK1", "CELA3B", "CLPS", "OLFM4", "ACTG1", "FTL")

# Annotating and ordering cells by some meaningful feature(s):
dittoHeatmap(seurat, genes, = c("celltype", "Sample"))
dittoHeatmap(seurat, genes, = c("celltype", "Sample"), = "Sample") = TRUE will normalize all expression data to the max expression of each gene [0,1], which is often useful for zero-enriched single-cell data.

show_colnames/show_rownames control whether cell/gene names will be shown. (show_colnames default is TRUE for bulk, and FALSE for single-cell.)

# Add annotations
dittoHeatmap(seurat, genes, = c("celltype", "Sample"), = TRUE,
    show_colnames = FALSE,
    show_rownames = FALSE)

A subset of the supplied genes can be given to the highlight.features input to have names shown for just these genes.

The heatmap can also be rendered by the ComplexHeatmap package, rather than by the pheatmap package (default), by setting complex to TRUE. This package offers a wide variety of distinct plot customization, including rasterization when the heatmap would be too complex for editing software like Illustrator.

# Highlight certain genes
dittoHeatmap(seurat, genes, = c("celltype", "Sample"),
    highlight.features = genes[1:3],
    complex = TRUE)

Additional tweaks can be added through other built in inputs or by providing additional inputs that get passed along to pheatmap::pheatmap (see ?pheatmap) or to ComplexHeatmap::pheatmap (see ?ComplexHeatmap::pheatmap and ?ComplexHeatmap::Heatmap on which the former function relies.)


These create either multiple plots or create plots that summarize data for multiple variables all in one plot. They make it easier to create summaries for many genes or many cell types without the need for writing loops.

Some setup for these, let's roughly pick out the markers of delta cells in this data set

# Idents(seurat) <- "celltype"
# delta.marker.table <- FindMarkers(seurat, ident.1 = "delta")
# delta.genes <- rownames(delta.marker.table)[1:20]
# Idents(seurat) <- "seurat_clusters"

delta.genes <- c(
    "SST", "RBP4", "LEPR", "PAPPA2", "LY6H",
    "CBLN4", "GPX3", "BCHE", "HHEX", "DPYSL3",
    "SERPINA1", "SEC11C", "ANXA2", "CHGB", "RGS2",
    "FXYD6", "KCNIP1", "SMOC1", "RPL10", "LRFN5")


A very succinct representation that is useful for showing differences between groups. The plot uses differently colored and sized dots to summarizes both expression level (color) and percent of cells/samples with non-zero expression (size) for multiple genes (or values of metadata) within different groups of cells/samples.

By default, expression values for all groups are centered and scaled to ensure a similar range of values for all vars displayed and to emphasize differences between groups.

dittoDotPlot(seurat, vars = delta.genes, = "celltype")
dittoDotPlot(seurat, vars = delta.genes, = "celltype",
    scale = FALSE)

multi_dittoPlot & dittoPlotVarsAcrossGroups

multi_dittoPlot() creates dittoPlots for multiple genes or metadata, one plot each.

dittoPlotVarsAcrossGroups() creates a dittoPlot-like representation where instead of representing samples/cells as in typical dittoPlots, each data point instead represents a gene (or metadata). More specifically, the average expression, within each x-grouping, of a gene (or value of a metadata).

multi_dittoPlot(seurat, delta.genes[1:6], = "celltype",
    vlnplot.lineweight = 0.2, jitter.size = 0.3)
dittoPlotVarsAcrossGroups(seurat, delta.genes, = "celltype",
    main = "Delta-cell Markers")

multi_dittoDimPlot & multi_dittoDimPlotVaryCells

multi_dittoDimPlot() creates dittoDimPlots for multiple genes or metadata, one plot each.

multi_dittoDimPlotVaryCells() creates dittoDimPlots for a single gene or metadata, but where distinct cells are highlighted in each plot. The vary.cells.meta input sets the discrete metadata to be used for breaking up cells/samples over distinct plots. This can be useful for checking/highlighting when a gene may be differentially expressed within multiple cell types or across all samples.

multi_dittoDimPlot(seurat, delta.genes[1:6])
multi_dittoDimPlotVaryCells(seurat, delta.genes[1],
    vary.cells.meta = "celltype")

Customization via Simple Inputs

Many adjustments can be made with simple additional inputs. Here, we go through a few that are consistent across most dittoSeq functions, but there are many more. Be sure to check the function documentation (e.g. ?dittoDimPlot) to explore more!

Subsetting to certain cells/samples

The cells/samples shown in a given plot can be adjusted with the cells.use input. This can be provided as either a list of cells' / samples' names to include, as an integer vector with the indices of cells to keep, or as a logical vector that states whether each cell / sample should be included.

# Original
dittoBarPlot(seurat, "celltype", = "Sample", scale = "count")

# First 10 cells
dittoBarPlot(seurat, "celltype", = "Sample", scale = "count",
    # String method
    cells.use = colnames(seurat)[1:10]
    # Index method, which would achieve the same effect
    # cells.use = 1:10

# Acinar cells only
dittoBarPlot(seurat, "celltype", = "Sample", scale = "count",
    # Logical method
    cells.use = meta("celltype", seurat) == "acinar")

Faceting with

dittoPlot, dittoDimPlot, and dittoScatterPlots can be split into separate plots for distinct groups of cells with the input.

dittoDimPlot(seurat, "celltype", = "Sample")
dittoDimPlot(seurat, "ENO1", = c("Sample", "celltype"))

All titles are adjustable.

Relevant inputs are generally main, sub, xlab, ylab, x.labels, and legend.title.

dittoBarPlot(seurat, "celltype", = "Sample",
    main = "Encounters",
    sub = "By Type",
    xlab = NULL, # NULL = remove
    ylab = "Generation 1",
    x.labels = c("Ash", "Misty", "Jessie", "James"),
    legend.title = "Types",
    var.labels.rename = c("Fire", "Water", "Grass", "Electric", "Psychic"),
    x.labels.rotate = FALSE)

As exemplified above, in some functions, the displayed data can be renamed too.

Colors can be adjusted easily.

Colors are normally set with color.panel or max.color and min.color. When color.panel is used (discrete data), an additional input called colors sets the order in which those are actually used to make swapping around colors easy when nearby clusters appear too similar in tSNE/umap plots!

# original - discrete
dittoDimPlot(seurat, "celltype")
# swapped colors
dittoDimPlot(seurat, "celltype",
    colors = 5:1)
# different colors
dittoDimPlot(seurat, "celltype",
    color.panel = c("red", "orange", "purple", "yellow", "skyblue"))
# original - expression
dittoDimPlot(seurat, "INS")
# different colors
dittoDimPlot(seurat, "INS",
    max.color = "red", min.color = "gray90")

Underlying data can be output.

Simply add data.out = TRUE to any of the individual plotters and a representation of the underlying data will be output.

dittoBarPlot(seurat, "celltype", = "Sample",
    data.out = TRUE)

For dittoHeatmap, a list of all the arguments that would be supplied to pheatmap are output. This allows users to make their own tweaks to how the expression matrix is represented before plotting, or even to use a different heatmap creator from pheatmap altogether.

dittoHeatmap(seurat, c("SST","CPE","GPX3"), cells.use = colnames(seurat)[1:5],
    data.out = TRUE)

plotly hovering can be added.

Many dittoSeq functions can be supplied do.hover = TRUE to have them convert the output into an interactive plotly object that will display additional data about each data point when the user hovers their cursor on top.

Generally, a second input,, is used to tell dittoSeq what extra data to display. This input takes in a vector of gene or metadata names (or "ident" for Seurat object clustering) in the order you wish for them to be displayed. However, when the types of underlying data possible to be shown are constrained because the plot pieces represent summary data (dittoBarPlot and dittoPlotVarsAcrossGroups), the input is not used.

# These can be finicky to render in knitting, but still, example code:
dittoDimPlot(seurat, "INS",
    do.hover = TRUE, = c("celltype", "Sample", "ENO1", "ident", "nCount_RNA"))
dittoBarPlot(seurat, "celltype", = "Sample",
    do.hover = TRUE)

Rasterization / flattening to pixels

Often, single-cell datasets have so many cells that working with plots that show data points for every cell in a vector-based graphics editor, such as Illustrator, becomes prohibitively computationally intensive. In such instances, it can be helpful to have the per-cell graphics layers flattened to a pixel representation. Generally, dittoSeq offers this capability for via do.raster and raster.dpi inputs.

# Note: dpi gets re-set by the styling code of this vignette, so this is
#   just a code example, but the plot won't be quite matched.
dittoDimPlot(sce, "celltype",
    do.raster = TRUE,
    raster.dpi = 300)

For dittoHeatmap(), where the plotting itself is handled externally, the control is a bit different and we rely on ?ComplexHeatmap::Heatmap's input for this. First, set complex = TRUE to have the heatmap rendered by ComplexHeatmap, then rasterization should be turned on by default when needed, but it can also be turned on manually with use_raster = TRUE.

dittoHeatmap(seurat, genes, = TRUE,
    complex = TRUE,
    use_raster = TRUE)

Session information



Try the dittoSeq package in your browser

Any scripts or data that you put into this service are public.

dittoSeq documentation built on April 17, 2021, 6:01 p.m.