Introduction

vizWithSCE is an R package that assists with the visualization of the integration of bulk DE results tables with pre-processed scRNA-seq datasets available on Bioconductor. vizWithSCE currently takes two arguments, a list containing res, dds, and a SingleCellExperiment (sce) as selected by the user and a second argument which. The which argument is the gene to use for the violin plot, represented as a number that corresponds to the gene with the n-th smallest p-value, e.g., 1 for the gene with lowest p-value, etc.

Run DESeq2 to obtain bulk DE results

We first load gene-summarized RNA-seq counts.

library(airway)

For our workflow, we used the bulk dataset "airway", which contains four human airway smooth muscle cell lines treated with dexamethasone, a steroid used to treat inflammation. This pre-processed dataset is available on Bioconductor. Here we filter to at least 3 samples with a count of 10 or higher, and then perform differential expression analysis.

Here, we changed the rownames to match the format of the sce.

library(DESeq2)
data(gse)
levels(gse$condition) <- c("untrt", "trt")
dds <- DESeqDataSet(gse, design = ~donor + condition)
keep <- rowSums(counts(dds) >= 10) >= 3
dds <- dds[keep,]
dds <- DESeq(dds)
res <- results(dds)
rownames(res) <- sub("\\..*","",rownames(res))
# the following is evaluated but hidden,
# because the above un-evaluated code chunk
# is slow to run
library(vizWithSCE)
data(dds)
dds <- DESeq2::estimateSizeFactors(dds)
data(res)

Run integrateWithSingleCell

We run integrateWithSingleCell and pick number 8 from the dataset selection as our single-cell dataset. Within PBMC there are nine different scRNA-seq datasets. We chose the dataset pbmc4k, which contains roughly 4 thousand cells. This pre-processed dataset is available on Bioconductor.

dat <- integrateWithSingleCell(res, dds)
# the following is evaluated but hidden,
# because the above un-evaluated code chunk
# involves user input.
library(TENxPBMCData)
library(scater)
tenx_pbmc4k <- TENxPBMCData(dataset = "pbmc4k")
sce <- tenx_pbmc4k
dat <- list(res=res, dds=dds, sce=sce)
dat$sce <- logNormCounts(dat$sce)

Label cell types or clusters

We use scater's logNormCounts function to add normalized data to the sce. We create a copy of the sce object in order to perform cell type labeling.

library(scater)
dat$sce <- scater::logNormCounts(dat$sce)
sce2 <- dat$sce

We convert the rownames from Ensembl to gene symbols, to match with the cell annotation reference dataset, described below.

library(org.Hs.eg.db)
rownames(sce2) <- mapIds(org.Hs.eg.db, rownames(sce2), "SYMBOL", "ENSEMBL")

We use celldex which is a database of reference expression datasets along with SingleR to provide cell type recognition. Now dat contains labels of different cell types.

library(celldex)
library(SingleR)
ref <- celldex::BlueprintEncodeData()
pred <- SingleR::SingleR(test=sce2, ref=ref, labels=ref$label.main)
table(pred$labels)
colLabels(dat$sce) <- pred$labels
# the following is evaluated but hidden,
# because the above un-evaluated code chunk
# is slow to run
data(labels)
colLabels(dat$sce) <- labels

Use vizWithSCE for visualization

We use vizWithSCE to pick the second gene by smallest p-value, and make a violin plot over the cell labels from SingleR. DUSP1 is known to play a role in response to environmental stress, and is most highly expressed in monocytes according to the single cell data.

library(vizWithSCE)
vizWithSCE(dat, which=2)

Future direction

Session info

sessionInfo()


KwameForbes/vizWithSCE documentation built on May 16, 2021, 4:41 a.m.