Quasi-UMI normalization is a technique for converting single-cell RNA-Seq read counts that lack unique molecular identifiers (UMIs) to a Poisson-lognormal target distribution that approximates a true UMI count distribution.
An example of a non-UMI count protocol is Smart-seq2. An example of a UMI count protocol is Drop-seq. Suppose we have a non-UMI count dataset ("A"). We recommend the following pipeline:
mc.cores
parameter.suppressPackageStartupMessages(library(SingleCellExperiment)) suppressPackageStartupMessages(library(EDASeq)) suppressPackageStartupMessages(library(Matrix)) library(scRNAseq) require(quminorm)
The Segerstolpe dataset is from Smart-seq2 of human pancreas tissue. We remove low-quality cells and genes and convert read counts to TPMs.
sce<-SegerstolpePancreasData(ensembl=TRUE, location=FALSE) sce<-sce[, colData(sce)$`single cell well quality`=="OK"] gl<-as.data.frame(getGeneLengthAndGCContent(rownames(sce),"hg19","org.db")) rowData(sce)<-cbind(rowData(sce),gl) sce<-sce[rowSums(counts(sce))>0 & !is.na(gl$length), ] counts(sce)<-Matrix(counts(sce)) tpm<-counts(sce)/rowData(sce)$length #recycling assay(sce,"tpm")<-t(t(tpm)/colSums(tpm))*1e6
The Grun dataset is from CEL-seq of human pancreas tissue. We use it as a reference to determine a reasonable shape parameter for quasi-UMI normalization.
This step is optional. If we didn't have time to look for a matching UMI counts dataset, we could just use the default of 2.0 for the shape parameter and skip to the next section.
set.seed(101) grun<-GrunPancreasData(ensembl=TRUE,location=FALSE) m<-round(counts(grun)) m<-m[rowSums(m)>0,] keep<-which(colSums(m)>5000) #subsample of cells to save time, in real life we would use all cells fit<-poilog_mle(m[,sample(keep,10)],mc.cores=2) summary(fit$sig)
A shape parameter of about 1.9 appears to be appropriate for this tissue type.
Visualize the UMI count distribution
#plot(table(m[,1]),main="Grun UMI counts") hist(log1p(m[,1]),main="Grun log(1+UMI counts)")
Going back to the Segerstolpe data, we can now obtain QUMI counts using our custom shape parameter of 1.9.
system.time(sce<-quminorm(sce,assayName="tpm",shape=1.9,mc.cores=2)) qumi<-assay(sce,"qumi_poilog_1.9") #plot(table(qumi[,1]),main="Segerstolpe QUMI counts") hist(log1p(qumi[,1]),main="Segerstolpe log(1+QUMI counts)")
Compare the QUMI count distribution to the original read count distribution.
hist(log1p(counts(sce)[,1]),main="Segerstolpe log(1+read counts)")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.