In ddiez/celltype: Prediction and analysis of cell types from single cell experiments

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(dplyr)
library(celltype)
library(SingleCellExperiment)
library(DT)

Introduction

Here we introduce the usage of celltype package. The main function uses a dictionary of immune cell markers that consist on expression level measurements of genes in different immune cell types. We use this dictionary to predict the likely cell type of a experimental dataset of single cell transcriptome measurements. The assignment is made using correlation between the expression levels of the markers in the dictionary and the same genes in the cells.

Dataset

Immune cell dictionary

As immune gene markers we use a set defined by the Immgen project. We have gene symbols for both human and mouse.

head(markers)

We use a dictionary of gene expression for the gene markers in different cell types from the Immgen project.

head(immgen.db)

Experimental data

As experimental dataset we use a test SingleCellExperiment object containing 500 murine splenic cells.

sce1

Analysis

Using the ImmGen dictionary

We can use predict_celltype() to obtain a matrix of cell type correlations. For each cell in the dataset we obtain the computed correlation to each of the markers that are present. By default the ImmGen dictionary is used.

celltype1 <- predict_celltype(sce1, tissue = "SP")
celltype1[1:5, 1:3]

To choose a particular celltype we can use choose_celltype(), which does that based on maximum correlation.

celltype1 <- choose_celltype(celltype1)
head(celltype1)

The Immgen hierarchy of cells is very specific. The function simplify_immgen_celltype() enables to focus on the top level cell in the hierarchy.

celltype1 <- celltype1 %>%mutate(celltype_simple = simplify_immgen_celltype(celltype))

head(celltype1)

Using the Immuno-Navigator dictionary

We can use alternative dictionaries. In this package a dictionary based on the Immuno-Navigator database is also available. Note that the cell hierarchies do not need to be identical. Nor both predictions need to be consistent.

celltype2 <- predict_celltype(sce1, name = "immnav")
celltype2[1:5, 1:2]

And again choose the cell type based on maximum correlation.

celltype2 <- choose_celltype(celltype2)
head(celltype2)

Using MCA dictionary

celltype3 <- predict_celltype(sce1, name = "mca", tissue = "spleen")
celltype3[1:5, 1:2]

celltype3 <- choose_celltype(celltype3)
head(celltype3)

Using custom dictionaries

It is possible also to define custom dictionaries. Here we make a dictionary for CD4 and CD8 T cells.

cells <- c("CD4", "CD8", "DP", "DN", "B")
my.db <- matrix(0, nrow = nrow(markers), ncol = length(cells), dimnames = list(markers[["mouse"]], cells))

my.db["Cd69", ] <- c(-1, -1, -1, -1, 1)
my.db["Cd19", ] <- c(-1, -1, -1, -1, 1)
my.db["Cd4", ] <- c(1, -1, 1, -1, -1)
my.db["Cd8a", ] <- c(-1, 1, 1, -1, -1)

my.db[c("Cd69", "Cd19", "Cd4", "Cd8a"), ]

celltype4 <- predict_celltype(sce1, db = my.db)
celltype4[1:5, 1:5]

celltype4 <- choose_celltype(celltype4)
head(celltype4)

Compare all predictions

We can compare the different predictions.

celltype1 <- celltype1 %>% select(-celltype_simple) %>%
  dplyr::rename(cell_img = celltype, cor_img = correlation) %>%
  mutate(cor_img = format(cor_img, digits = 3))

celltype2 <- celltype2 %>%
  dplyr::rename(cell_nav = celltype, cor_nav = correlation) %>%
  mutate(cor_nav = format(cor_nav, digits = 3))

celltype3 <- celltype3 %>%
  dplyr::rename(cell_mca = celltype, cor_mca = correlation) %>%
  mutate(cor_mca = format(cor_mca, digits = 3))

celltype4 <- celltype4 %>%
  dplyr::rename(cell_cus = celltype, cor_cus = correlation) %>%
  mutate(cor_cus = format(cor_cus, digits = 3))


Reduce(left_join, list(celltype1, celltype2, celltype3, celltype4)) %>%
  datatable(rownames = FALSE)