In grst/immunedeconv: Methods for immune cell deconvolution

devtools::load_all(".")

library(dplyr)
library(ggplot2)
library(tidyr)
library(immunedeconv)
library(tibble)

As previously done for the human deconvolution methods, Immunedeconv includes an example dataset with samples from mouse blood and spleen from [@Petitprez2020]. It is available from immunedeconv::dataset_petitprez. It contains a gene expression matrix (dataset_petitprez$expr_mat) generated using bulk RNA-seq and 'gold standard' estimates of immune cell contents profiled with FACS (dataset_petitprez$ref). We are going to use the bulk RNA-seq data to run the deconvolution methods and will compare the results to the FACS data later on.

# show the first 5 lines of the gene expression matrix
knitr::kable(dataset_petitprez$expr_mat[1:5, ])

Deconvolution with mouse-based methods

To estimate immune cell fractions, we simply have to invoke the deconvolute_mouse function. It requires the specification of one of the following methods for deconvolution:

deconvolution_methods_mouse

For this example, we use mMCPcounter. As a result, we obtain a cell_type x sample data frame with cell-type scores for each sample.

res_mMCPcounter <- deconvolute_mouse(dataset_petitprez$expr_mat, "mmcp_counter")

Similarly to its human counterpart, mMCP-counter provides scores in arbitrary units that are only comparable between samples, but not between cell-types.

res_mMCPcounter <- res_mMCPcounter[res_mMCPcounter$cell_type %in% colnames(dataset_petitprez$ref), ]


res_mMCPcounter %>%
  gather(sample, score, -cell_type) %>%
  ggplot(aes(x = sample, y = score, color = cell_type)) +
  geom_point(size = 4) +
  facet_wrap(~cell_type, scales = "free_x", ncol = 3) +
  scale_color_brewer(palette = "Paired", guide = FALSE) +
  coord_flip() +
  theme_bw() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))

Deconvolution with human-based methods

Human-based methods can still be used to deconvolve mouse data through the use of orthologous genes. The function mouse_genes_to_human does that by retrieving the correspondent gene names with biomaRt. Since the gene names are retrieved from the Ensembl database, it can happen that the command has to be run with different Emsembl mirrors (see the documentation)

dataset_petitprez_humanGenes <- convert_human_mouse_genes(dataset_petitprez$expr_mat, convert_to = 'human')
res_MCPcounter <- deconvolute(dataset_petitprez_humanGenes, 'mcp_counter')

Comparison with FACS data

Let's now compare the results with 'gold standard' FACS data obtained for the four samples. This is, of course, not a representative benchmark, but it gives a notion about what magnitude of predictive accuracy we can expect.

# construct a single dataframe containing all data
#
# re-map the cell-types to common names.
# only include the cell-types that are measured using FACS
cell_types <- c("B cell", "T cell CD8+", "T cell", "NK cell", "Monocyte")

tmp_res <- res_mMCPcounter %>%
  gather("sample", "estimate", -cell_type)

reference_facs <- dataset_petitprez$ref %>%
  gather("cell_type", "true_fraction", -"Sample Name") %>%
  set_colnames(., c("sample", "cell_type", "true_fraction"))

result <- tmp_res %>%
  inner_join(reference_facs)

Plot the true vs. estimated values:

result %>%
  ggplot(aes(x = true_fraction, y = estimate)) +
  geom_point(aes(color = cell_type)) +
  facet_wrap(. ~ cell_type, scales = "free_y", ncol = 2) +
  theme_bw()