In KwameForbes/vizWithSCE: Visualization of Bulk Differential Analysis Results for DESeq2 with Single-Cell Datasets

Introduction

vizWithSCE is an R package that assists with the visualization of the integration of bulk DE results tables with pre-processed scRNA-seq datasets available on Bioconductor. vizWithSCE currently takes two arguments, a list containing res, dds, and a SingleCellExperiment (sce) as selected by the user and a second argument which. The which argument is the gene to use for the violin plot, represented as a number that corresponds to the gene with the n-th smallest p-value, e.g., 1 for the gene with lowest p-value, etc.

Run DESeq2 to obtain bulk DE results

We first load gene-summarized RNA-seq counts.

library(airway)

For our workflow, we used the bulk dataset "airway", which contains four human airway smooth muscle cell lines treated with dexamethasone, a steroid used to treat inflammation. This pre-processed dataset is available on Bioconductor. Here we filter to at least 3 samples with a count of 10 or higher, and then perform differential expression analysis.

Here, we changed the rownames to match the format of the sce.

library(DESeq2)
data(gse)
levels(gse$condition) <- c("untrt", "trt")
dds <- DESeqDataSet(gse, design = ~donor + condition)
keep <- rowSums(counts(dds) >= 10) >= 3
dds <- dds[keep,]
dds <- DESeq(dds)
res <- results(dds)
rownames(res) <- sub("\\..*","",rownames(res))

# the following is evaluated but hidden,
# because the above un-evaluated code chunk
# is slow to run
library(vizWithSCE)
data(dds)
dds <- DESeq2::estimateSizeFactors(dds)
data(res)

Run `integrateWithSingleCell`

We run integrateWithSingleCell and pick number 8 from the dataset selection as our single-cell dataset. Within PBMC there are nine different scRNA-seq datasets. We chose the dataset pbmc4k, which contains roughly 4 thousand cells. This pre-processed dataset is available on Bioconductor.

dat <- integrateWithSingleCell(res, dds)

# the following is evaluated but hidden,
# because the above un-evaluated code chunk
# involves user input.
library(TENxPBMCData)
library(scater)
tenx_pbmc4k <- TENxPBMCData(dataset = "pbmc4k")
sce <- tenx_pbmc4k
dat <- list(res=res, dds=dds, sce=sce)
dat$sce <- logNormCounts(dat$sce)

Label cell types or clusters

We use scater's logNormCounts function to add normalized data to the sce. We create a copy of the sce object in order to perform cell type labeling.

library(scater)
dat$sce <- scater::logNormCounts(dat$sce)
sce2 <- dat$sce

We convert the rownames from Ensembl to gene symbols, to match with the cell annotation reference dataset, described below.

library(org.Hs.eg.db)
rownames(sce2) <- mapIds(org.Hs.eg.db, rownames(sce2), "SYMBOL", "ENSEMBL")

We use celldex which is a database of reference expression datasets along with SingleR to provide cell type recognition. Now dat contains labels of different cell types.

library(celldex)
library(SingleR)
ref <- celldex::BlueprintEncodeData()
pred <- SingleR::SingleR(test=sce2, ref=ref, labels=ref$label.main)
table(pred$labels)
colLabels(dat$sce) <- pred$labels

# the following is evaluated but hidden,
# because the above un-evaluated code chunk
# is slow to run
data(labels)
colLabels(dat$sce) <- labels

Use vizWithSCE for visualization

We use vizWithSCE to pick the second gene by smallest p-value, and make a violin plot over the cell labels from SingleR. DUSP1 is known to play a role in response to environmental stress, and is most highly expressed in monocytes according to the single cell data.

library(vizWithSCE)
vizWithSCE(dat, which=2)

Future direction

We are currently planning to add more features, including other types of plots, allowing users to further explore the intersection of the two datasets. We seek to add multiple gene plots, e.g. dotplots or heatmaps of sets of DE genes.
We are considering statistical tests of the distribution of expression across cell types for sets of DE genes, e.g. up-regulated vs down-regulated DE genes.
We will explore launching iSEE from our worklow, a well-developed package for interactive exploration of SCE datasets.