In satijalab/seurat: Tools for Single Cell Genomics

all_times <- list()  # store the time for each chunk
knitr::knit_hooks$set(time_it = local({
  now <- NULL
  function(before, options) {
    if (before) {
      now <<- Sys.time()
    } else {
      res <- difftime(Sys.time(), now, units = "secs")
      all_times[[options$label]] <<- res
    }
  }
}))
knitr::opts_chunk$set(
  tidy = TRUE,
  tidy.opts = list(width.cutoff = 95),
  message = FALSE,
  warning = FALSE,
  time_it = TRUE,
  error = TRUE
)

Developed in collaboration with the Technology Innovation Group at NYGC, Cell Hashing uses oligo-tagged antibodies against ubiquitously expressed surface proteins to place a "sample barcode" on each single cell, enabling different samples to be multiplexed together and run in a single experiment. For more information, please refer to this paper.

This vignette will give a brief demonstration on how to work with data produced with Cell Hashing in Seurat. Applied to two datasets, we can successfully demultiplex cells to their the original sample-of-origin, and identify cross-sample doublets.

The demultiplexing function `HTODemux()` implements the following procedure:

We perform a k-medoid clustering on the normalized HTO values, which initially separates cells into K(# of samples)+1 clusters.
We calculate a 'negative' distribution for HTO. For each HTO, we use the cluster with the lowest average value as the negative group.
For each HTO, we fit a negative binomial distribution to the negative cluster. We use the 0.99 quantile of this distribution as a threshold.
Based on these thresholds, each cell is classified as positive or negative for each HTO.
Cells that are positive for more than one HTOs are annotated as doublets.

# 8-HTO dataset from human PBMCs

Dataset description:

Data represent peripheral blood mononuclear cells (PBMCs) from eight different donors.
Cells from each donor are uniquely labeled, using CD45 as a hashing antibody.
Samples were subsequently pooled, and run on a single lane of the the 10X Chromium v2 system.
You can download the count matrices for RNA and HTO [here](https://www.dropbox.com/sh/ntc33ium7cg1za1/AAD_8XIDmu4F7lJ-5sp-rGFYa?dl=0), or the FASTQ files from [GEO](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE108313)

## Basic setup Load packages wzxhzdk:1 Read in data wzxhzdk:2 Setup Seurat object and add in the HTO data wzxhzdk:3 ## Adding HTO data as an independent assay You can read more about working with multi-modal data [here](multimodal_vignette.html) wzxhzdk:4 ## Demultiplex cells based on HTO enrichment Here we use the Seurat function `HTODemux()` to assign single cells back to their sample origins. wzxhzdk:5 ## Visualize demultiplexing results Output from running `HTODemux()` is saved in the object metadata. We can visualize how many cells are classified as singlets, doublets and negative/ambiguous cells. wzxhzdk:6 Visualize enrichment for selected HTOs with ridge plots wzxhzdk:7 Visualize pairs of HTO signals to confirm mutual exclusivity in singlets wzxhzdk:8 Compare number of UMIs for singlets, doublets and negative cells wzxhzdk:9 Generate a two dimensional tSNE embedding for HTOs. Here we are grouping cells by singlets and doublets for simplicity. wzxhzdk:10 Create an HTO heatmap, based on Figure 1C in the Cell Hashing paper. wzxhzdk:11 Cluster and visualize cells using the usual scRNA-seq workflow, and examine for the potential presence of batch effects. wzxhzdk:12 wzxhzdk:13 # 12-HTO dataset from four human cell lines

Dataset description:

Data represent single cells collected from four cell lines: HEK, K562, KG1 and THP1
Each cell line was further split into three samples (12 samples in total).
Each sample was labeled with a hashing antibody mixture (CD29 and CD45), pooled, and run on a single lane of 10X.
Based on this design, we should be able to detect doublets both across and within cell types
You can download the count matrices for RNA and HTO [here](https://www.dropbox.com/sh/c5gcjm35nglmvcv/AABGz9VO6gX9bVr5R2qahTZha?dl=0), and are available on GEO [here](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE108313)

## Create Seurat object, add HTO data and perform normalization wzxhzdk:14 ## Demultiplex data wzxhzdk:15 ## Visualize demultiplexing results Distribution of selected HTOs grouped by classification, displayed by ridge plots wzxhzdk:16 Visualize HTO signals in a heatmap wzxhzdk:17 ## Visualize RNA clustering

Below, we cluster the cells using our standard scRNA-seq workflow. As expected we see four major clusters, corresponding to the cell lines

In addition, we see small clusters in between, representing mixed transcriptomes that are correctly annotated as doublets.

We also see within-cell type doublets, that are (perhaps unsurprisingly) intermixed with singlets of the same cell type

wzxhzdk:18 wzxhzdk:19

**Session Info**

wzxhzdk:20

satijalab/seurat documentation built on April 11, 2025, 4:32 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker