This vignette shows annotation of PBMC3k dataset from the Seurat tutorial



First, read the data. You need to set your path to data here:

pbmc_data <- Read10X("~/mh/Data/10x/pbmc3k/filtered_gene_bc_matrices/hg19/")
pbmc <- CreateSeuratObject(pbmc_data, project="pbmc3k", min.cells=3, min.features=200)
pbmc[[""]] <- PercentageFeatureSet(pbmc, pattern = "^MT-")

pbmc <- subset(pbmc, subset = nFeature_RNA > 200 & nFeature_RNA < 2500 & < 5) %>% 
  NormalizeData(verbose=F) %>% FindVariableFeatures(selection.method = "vst", nfeatures=2000, verbose=F) %>% 
  ScaleData(verbose=F) %>% RunPCA(features=VariableFeatures(.), verbose=F) %>% 

To run annotation we also need to estimate cell neighbor graph, and clustering to improve quality. In general, to be able to detect small populations high clustering resolution is required.

pbmc <- FindNeighbors(pbmc, dims=1:10, verbose=F) %>% FindClusters(resolution=5, verbose=F)
DimPlot(pbmc, reduction = "tsne", label=T) + NoLegend()

We need to extract required information from the Seurat object. Exact fields may vary depending on preprocessing options you used.

cm <- pbmc@assays$RNA@counts
cm_norm <- Matrix::t(pbmc@assays$RNA@data)
graph <- pbmc@graphs$RNA_snn
emb <- pbmc@reductions$tsne@cell.embeddings
clusters <- setNames($seurat_clusters, rownames(

Now we can run annotation:

marker_path <- "../markers/"

clf_data <- getClassificationData(cm, marker_path)
ann_by_level <- assignCellsByScores(graph, clf_data, clusters=clusters)

And plot the annotation:

Idents(pbmc) <- ann_by_level$annotation$l1
DimPlot(pbmc, reduction = "tsne", label=T) + NoLegend()

Let's also plot marker expression per cluster to ensure that everything is correct:

plotSubtypeMarkers(emb, cm_norm, parent.type="root",, n.col=3)

