# devtools::load_all(".") library(immunedeconv) library(data.tree) library(tibble) library(dplyr)
Methods like fluorescence activated cell sorting (FACS) or Immunohistochemistry (IHC)-staining have been used as a gold standard to estimate the immune cell content within a sample, however these methods are limited in their scalability and by the availability of good antibodies against the cell type markers. High throughput transcriptomic methods allow to get a transcriptional landscape in the sample with a relatively small amount of material that can be extremely limited in clinical settings (e.g. tumor biopsies), which led to high utility of methods like RNA-seq and microarrays to characterize patient tumor samples. However, RNA-seq does not provide a detailed information on a cellular composition of a sample, which then has to be inferred using computational techniques.
Such methods can, in general, be classified in two categories:
knitr::include_graphics("img/concepts_deconvolution.gif")
Marker gene based approaches (a) are based on a list of genes (signature), that are characteristic for a cell type. By looking at the expression values of signature genes, every cell type is quantified independently, either using the gene expression values directly (MCP-counter) or by performing a statistical test for enrichment of the signatures (xCell).
Deconvolution methods (b) formulate the problem as a system of equations that describe the gene expression of a sample as the weighted sum of the contributions of the different cell types. By solving the inverse problem, cell type fractions can be inferred given a signature matrix and the mixed gene expression. This can be accomplished using $\nu$-Support Vector Regression (SVR) (CIBERSORT) constrained least square regression (quanTIseq, EPIC) or linear least square regression (TIMER).
For more information, check out the review by [@Finotello2018].
The input data is a gene
$\times$ sample
gene expression
matrix. In general values should be
For xCell and MCP-counter this is not so important. xCell works on the ranks of the gene expression only and MCP-counter sums up the gene expression values.
Rownames are expected to be HGNC gene symbols. Instead of a matrix, immunedeconv also supports ExpressionSets (see below).
This package gives you easy access to these methods. To run a method with default options, simply invoke
immunedeconv::deconvolute(gene_expression_matrix, method)
where gene_expression_matrix
is a matrix with genes in rows and samples in
columns. The rownames must be HGNC symbols and the colnames must be sample
names. The method can be one of
quantiseq timer cibersort cibersort_abs mcp_counter xcell epic abis consensus_tme estimate
The ESTIMATE algorithm, which computes a score for the tumoral, immune and stromal components and the fraction of tumor purity of a sample, has been implemented.
immunedeconv::deconvolute_estimate(gene_expression_matrix)
Imunedeconv has been extended to include methods aimed at the deconvolution of mouse data.
The format of the input gene_expression_matrix
is the same.
immunedeconv::deconvolute_mouse(gene_expression_matrix, method)
The method can be one of
mmcp_counter seqimmucc dcq base
In addition, human-based methods can be used to deconvolute mouse data through the conversion of mouse gene names to the corresponding human orthologues
gene_expression_matrix <- immunedeconv::mouse_genes_to_human(gene_expression_matrix) immunedeconv::deconvolute(gene_expression_matrix, "quantiseq")
Finally, certain methods can be used with custom signatures, consisting of either a signature matrix or signature genes for the cell types of interest. Since the information used to deconvolute the bulk is user-provided, these functions can be used for different tissues and organisms. The functions may require different input data formats, related to the requirements of each method. Please refer to their documentation. The available methods are
base: deconvolute_base_custom() cibersort norm/abs: deconvolute_cibersort_custom() epic: deconvolute_epic_custom() consensus_tme: deconvolute_consensus_tme_custom()
For this example, we use a dataset of four melanoma patients from [@EPIC2017].
res <- deconvolute(immunedeconv::dataset_racle$expr_mat, "quantiseq") knitr::kable(res, digits = 2)
CIBERSORT is only freely available for academic users and could not be directly included in this package. To use CIBERSORT with this package, you need to register on the cibersort website, obtain a license, and download the CIBERSORT source code.
The source code package contains two files, that are required:
CIBERSORT.R LM22.txt
Note the storage location of these files. When using immunedeconv
, you need
to tell the package where it can find those files:
library(immunedeconv) set_cibersort_binary("/path/to/CIBERSORT.R") set_cibersort_mat("/path/to/LM22.txt")
Afterwards, you can call
deconvolute(your_mixture_matrix, "cibersort") # or 'cibersort_abs'
as for any other method.
TIMER and ConsensusTME uses indication-specific reference profiles. Therefore, you must specify the tumor type when running TIMER or ConsensusTME:
deconvolute(your_mixture_matrix, "timer", indications=c("SKCM", "SKCM", "BLCA"))
indications
needs to be a vector that specifies an indication for each sample
(=column) in the mixture matrix. The indications supported by TIMER are
immunedeconv::timer_available_cancers
What the abbreviations stand for is documented on the TCGA wiki.
seqImmuCC is a method that can deconvolute using two regression approaches, SVR or LLSR. If the SVR approach is chosen, then the CIBERSORT script needs to be provided as described above.
The Bioconductor ExpressionSet is a convenient way to store a gene expression matrix with metadata for both samples and genes in a single object.
immunedeconv
supports the use of an ExpressionSet instead of a gene
expression matrix. In that case, pData
requires a column that contains gene
symbols. Which one needs to be specified in the deconvolute()
call:
deconvolute(my_expression_set, "quantiseq", column = "<column name>")
To provide consistently named results independent of the method, we defined a controlled vocabulary (CV) of cell-types and arranged them in a tree.
cell_type_hierarchy <- new.env() with(cell_type_hierarchy, { tree <- immunedeconv:::.get_cell_type_tree() SetGraphStyle(tree, rankdir = "LR") SetEdgeStyle(tree, arrowhead = "vee", color = "grey35", penwidth = 2) SetNodeStyle(tree, style = "filled,rounded", shape = "box", fillcolor = "GreenYellow", fontname = "helvetica", tooltip = GetDefaultTooltip, fontcolor = "black" ) plot(tree) })
For each method, each cell-type is mapped to a node in the tree. If you are curious, it's all defined in this excel sheet.
This tree can be used to summarize scores along the tree. For instance,
quanTIseq provides scores for regulatory and non-regulatory CD4+ T cells independently, but you are
interested in the fraction of overall CD4+ T cells. In that case you can use
map_result_to_celltypes
to sum up the scores:
res <- deconvolute(immunedeconv::dataset_racle$expr_mat, "quantiseq") %>% map_result_to_celltypes(c("T cell CD4+"), "quantiseq") knitr::kable(res, digits = 2)
The algorithm is explained in detail in the methods section of [@sturm2019].
In general, cell-type scores allow for the comparison (1) between samples, (2) between cell-types or (3) both. Between-sample comparisons allow to make statements such as "In patient A, there are more CD8+ T cells than in patient B". Between-cell-type comparisons allow to make statements such as "In a certain patient, there are more B cells than T cells". For more information, see our Benchmark paper ([@sturm2019]).
EPIC and quanTIseq are currently the only methods providing an absolute score, i.e. a score that can be interpreted as a cell fraction. These methods also provide an estimate for the amount of uncharacterized cells, i.e. cells for that no signature exists. This measure often corresponds to the fraction of cancer cells in the sample.
CIBERSORT abs., while allowing both between- and within-sample comparisons, generates a score in arbitrary units.
No, currently not. The reason is that the methods are conceptually different. Some are marker gene based and others deconvolution-based. CIBERSORT performs feature-selection on the matrix while EPIC and quanTIseq don't. EPIC uses all genes to estimate the inter-sample variance while quanTIseq uses marker genes only. This is also being discussed in #15.
You can, however, provide custom signatures for most individual methods (see next question).
deconvolute
function.You can access each method individually through the deconvolute_xxx
function.
Through these functions you can access all native features. See the function
reference for details.
If you believe that the feature is available across multiple methods and
should be added to the deconvolute
interface, feel free to open an
issue or pull request.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.