knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
# remove the minimum number of characters to display character columns (default: 0)
options(pillar.min_chars = Inf)

Introduction

Assignment of cell type labels to scRNA-seq clusters is particularly difficult when unexpected or poorly described populations are present. There are fully automated algorithms for cell type annotation, but sometimes a more in-depth analysis is helpful in understanding the captured cells. This is an example of exploratory cell type analysis using clustermole, starting with a Seurat object.

The dataset used in this example contains hematopoietic and stromal bone marrow populations (Baccin et al.). This experiment was selected because it includes both well-known as well as rare cell types.

Load data

Load relevant packages.

library(Seurat)
library(dplyr)
library(ggplot2)
library(ggsci)
library(clustermole)

Download the dataset, which is stored as a Seurat object. It was subset for this tutorial to reduce the size and speed up processing.

so <- readRDS(url("https://osf.io/cvnqb/download"))
so

Check the experiment labels on a tSNE visualization, as shown in the original publication (original figure).

DimPlot(so, reduction = "tsne", group.by = "experiment", shuffle = TRUE) +
  theme(aspect.ratio = 1, legend.text = element_text(size = rel(0.7))) +
  scale_color_nejm()

Check the cell type labels on a tSNE visualization.

DimPlot(so, reduction = "tsne", group.by = "celltype", shuffle = TRUE) +
  theme(aspect.ratio = 1, legend.text = element_text(size = rel(0.8))) +
  scale_color_igv()

Set the Seurat object cell identities to the predefined cell type labels for the next steps.

Idents(so) <- "celltype"
levels(Idents(so))

Marker gene overlaps

One type of analysis facilitated by clustermole is based on comparison of marker genes.

We can start with the B-cells, which is a well-defined population used in many studies.

Find markers for the B-cell cluster.

b_markers_df <- FindMarkers(so, ident.1 = "B-cell", min.pct = 0.2, only.pos = TRUE, verbose = FALSE)
nrow(b_markers_df)

This gives us a data frame with hundreds of genes. We can subset to just the best 25 markers.

b_markers <- head(rownames(b_markers_df), 25)
b_markers

Check the overlap of B-cell markers with all clustermole cell type signatures.

overlaps_tbl <- clustermole_overlaps(genes = b_markers, species = "mm")

Check the top scoring cell types corresponding to the B-cell cluster markers.

head(overlaps_tbl, 15)

As would be expected for a well-defined population, the top results are various B-cell populations. We can repeat this process for other populations that are more obscure.

Find markers for the Adipo-CAR cluster. These are Cxcl12-abundant reticular (CAR) cells expressing adipocyte-lineage genes.

acar_markers_df <- FindMarkers(so, ident.1 = "Adipo-CAR", min.pct = 0.2, only.pos = TRUE, verbose = FALSE)
acar_markers <- head(rownames(acar_markers_df), 25)
acar_markers

Check the overlap of Adipo-CAR markers with all cell type signatures.

overlaps_tbl <- clustermole_overlaps(genes = acar_markers, species = "mm")

Check the top scoring cell types for the Adipo-CAR cluster.

head(overlaps_tbl, 15)

The top results are more diverse than for B-cells, but related populations are among the top candidates.

Find markers for the Osteoblasts cluster.

o_markers_df <- FindMarkers(so, ident.1 = "Osteoblasts", min.pct = 0.2, only.pos = TRUE, verbose = FALSE)
o_markers <- head(rownames(o_markers_df), 25)
o_markers

Check overlap of Osteoblasts markers with all cell type signatures.

overlaps_tbl <- clustermole_overlaps(genes = o_markers, species = "mm")

Check the top scoring cell types for the Osteoblasts cluster.

head(overlaps_tbl, 15)

The top results are again more diverse than for B-cells, but the appropriate populations are listed.

Enrichment of markers

Rather than comparing marker genes, it's also possible to run enrichment of cell type signatures across all genes. This avoids having to define an optimal set of markers.

Calculate the average expression levels for each cell type.

avg_exp_mat <- AverageExpression(so)

Convert to a regular matrix and log-transform.

avg_exp_mat <- as.matrix(avg_exp_mat$RNA)
avg_exp_mat <- log1p(avg_exp_mat)

Preview the expression matrix.

avg_exp_mat[1:5, 1:5]

Run enrichment of all cell type signatures across all clusters.

enrich_tbl <- clustermole_enrichment(expr_mat = avg_exp_mat, species = "mm")

Check the most enriched cell types for the B-cell cluster.

enrich_tbl %>%
  filter(cluster == "B-cell") %>%
  head(15)

As with the previous analysis, the top results are various B-cell populations.

Check the most enriched cell types for the Adipo-CAR cluster.

enrich_tbl %>%
  filter(cluster == "Adipo-CAR") %>%
  head(15)

Check the most enriched cell types for the Osteoblasts cluster.

enrich_tbl %>%
  filter(cluster == "Osteoblasts") %>%
  head(15)


igordot/clustermole documentation built on May 9, 2024, 9:31 p.m.