This vigettte demonstrates how to run schex on Seurat objects, which aims to provide better plots. If you use schex, please cite:
Single cell transcriptomics reveals spatial and temporal dynamics of gene expression in the developing mouse spinal cord
Delile, Julien, Teresa Rayon, Manuela Melchionda, Amelia Edwards, James Briscoe, and Andreas Sagner.
doi: 0.1242/dev.173807
Github: https://github.com/SaskiaFreytag/schex
knitr::opts_chunk$set( tidy = TRUE, tidy.opts = list(width.cutoff = 95), message = FALSE, warning = FALSE, fig.height = 10, fig.width = 16 )
Reduced dimension plotting is one of the essential tools for the analysis of
single cell data. However, as the number of cells/nuclei in these these plots
increases, the usefulness of these plots decreases. Many cells are plotted
on top of each other obscuring information, even when taking advantage of
transparency settings. This package provides binning strategies of cells/nuclei
into hexagon cells. Plotting summarized information of all cells/nuclei in their
respective hexagon cells presents information without obstructions. The
package seemlessly works with the two most common object classes for the storage
of single cell data; SingleCellExperiment
from the
SingleCellExperiment
package and Seurat
from the Seurat package. In
this vignette I will be presenting the use of schex
for Seurat
objects.
Prerequisites to install that are not available via install.packages
:
library(Seurat) library(SeuratData) library(ggplot2) library(ggrepel) library(dplyr) theme_set(theme_classic()) library(schex)
In order to demonstrate the capabilities of the schex package, I will use the a dataset of Peripheral Blood Mononuclear Cells (PBMC) freely available from 10x Genomics. There are 2,700 single cells that were sequenced on the Illumina NextSeq 500. You can download the data from the Seurat website.
InstallData("pbmc3k") pbmc <- pbmc3k
In the next section, I will perform some simple quality control steps outlined in the Seurat vignette. I will then calculate various dimension reductions and cluster the data, as also outlined in the vignette.
pbmc[["percent.mt"]] <- PercentageFeatureSet(pbmc, pattern = "^MT-") pbmc %>% subset(subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & percent.mt < 5) %>% NormalizeData() %>% FindVariableFeatures() %>% ScaleData() %>% RunPCA(verbose = FALSE) %>% RunUMAP(dims = 1:10) %>% FindNeighbors(dims = 1:10) %>% FindClusters(resolution = 0.5, verbose = FALSE) -> pbmc
At this stage in the workflow we usually would like to plot aspects of our data in one of the reduced dimension representations. Instead of plotting this in an ordinary fashion, I will demonstrate how schex can provide a better way of plotting this.
First, I will calculate the hexagon cell representation for each cell for
a specified dimension reduction representation. I decide to use nbins=40
which
specifies that I divide my x range into 40 bins. Note that this might be a
parameter that you want to play around with depending on the number of cells/
nuclei in your dataset. Generally, for more cells/nuclei, nbins
should be
increased.
pbmc <- make_hexbin(pbmc, nbins = 40, dimension_reduction = "UMAP")
First I plot how many cells are in each hexagon cell. This should be
relatively even, otherwise change the nbins
parameter in the previous
calculation.
plot_hexbin_density(pbmc)
Next I colour the hexagon cells by some meta information, such as the median total count or cluster membership in each hexagon cell.
plot_hexbin_meta(pbmc, col="nCount_RNA", action="median")
plot_hexbin_meta(pbmc, col="RNA_snn_res.0.5", action="majority")
For convenience there is also a function that allows the calculation of label
positions for factor variables. These can be overlayed with the package
ggrepel
.
label_df <- make_hexbin_label(pbmc, col="RNA_snn_res.0.5") pp <- plot_hexbin_meta(pbmc, col="RNA_snn_res.0.5", action="majority") pp + ggrepel::geom_label_repel(data = label_df, aes(x=x, y=y, label = label), colour="black", label.size = NA, fill = NA)
Finally, I will visualize the gene expression of the CD19 gene in the hexagon cell representation.
gene_id <-"CD19" plot_hexbin_gene(pbmc, type="logcounts", gene=gene_id, action="mean", xlab="UMAP1", ylab="UMAP2", title=paste0("Mean of ", gene_id))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.