Package implementing EPIC method to estimate the proportion of immune, stromal, endothelial and cancer or other cells from bulk gene expression data. It is based on reference gene expression profiles for the main non-malignant cell types and it predicts the proportion of these cells and of the remaining “other cells” (that are mostly cancer cells) for which no reference profile is given.
This method is described in the publication from Racle et al., 2017 available at https://elifesciences.org/articles/26476.
EPIC is also available as a web application: http://epic.gfellerlab.org.
The main function in this package is EPIC
. It needs as input a matrix
of the TPM (or RPKM) gene expression from the samples for which to
estimate cell proportions. One can also define the reference cells to
use
# library(EPIC) ## If the package isn't loaded (or use EPIC::EPIC and so on).
out <- EPIC(bulk = bulkSamplesMatrix)
out <- EPIC(bulk = bulkSamplesMatrix, reference = referenceCellsList)
out
is a list containing the various mRNA and cell fractions in each
sample as well as some data.frame of the goodness of fit.
Values of mRNA per cell and signature genes to use can also be changed:
out <- EPIC(bulk = bulkSamplesMatrix, reference = referenceCellsList, mRNA_cell = mRNA_cell_vector, sigGenes = sigGenes_vector)
out <- EPIC(bulk = bulkSamplesMatrix, reference = referenceCellsList, mRNA_cell_sub = mRNA_cell_sub_vector)
Various other options are available and are well documented in the help pages from EPIC:
?EPIC::EPIC
?EPIC::EPIC.package
install.packages("devtools")
devtools::install_github("GfellerLab/EPIC", build_vignettes=TRUE)
EPIC is also available as a web application: http://epic.gfellerlab.org.
A pyhton wrapper has been written by Stephen C. Van Nostrand from MIT and is available at https://github.com/scvannost/epicpy.
EPIC can be used freely by academic groups for non-commercial purposes. The product is provided free of charge, and, therefore, on an “as is” basis, without warranty of any kind. Please read the file “LICENSE” for details.
If you plan to use EPIC (version 1.1) in any for-profit application, you are required to obtain a separate license. To do so, please contact Nadette Bulgin (nbulgin@lcr.org) at the Ludwig Institute for Cancer Research Ltd.
Julien Racle (julien.racle@unil.ch), and David Gfeller (david.gfeller@unil.ch).
However, please note, that when the goal is to benchmark EPIC predictions, if the ‘bulk samples’ correspond in fact to in silico samples reconstructed for example from single-cell RNA-seq data, then it is usually better to compare the ‘true’ proportions against the mRNAProportions from EPIC. Indeed, when building such in silico samples, the fact that different cell types express different amount of mRNA is usually not taken into account. On the other side, if working with true bulk samples, then you should compare the true cell proportions (measured e.g., by FACS) against the cellFractions.
If the mRNA proportions of these cell types are low, then even if you don’t correct the results with their true mRNA/cell abundances, it would not really have a big impact on the results. On the other side, if there are many of these cells in your bulk sample, the results might be a little bit biased, but the effect should be similar for all samples and thus not have a too big importance (maybe you wouldn’t be fully able to tell if there are more CAFs than Tcells for example, but you should still have a good estimate of which sample has more CAFs (or Tcells) than which other sample for example).
When such a warning message appears, it means that the optimization didn’t manage to fully converge for this regression, for some of the samples. You can then check the “fit.gof\$convergeCode” (and possibly also “fit.gof\$convergeMessage”) that is outputted by EPIC alongside the cell proportions. This will tell you which samples had issue with the convergence (a value of 0 means it converged ok, while other values are errors/warnings, their meaning can be found in the help of “optim” (or “constrOptim”) function from R (from “stats” package) which is used during the optimization and we simply forward the message it returns).
The error code that usually comes is a “1” which means that the maximum number of iterations has been reached in the optimization. This could mean there is an issue with the bulk gene expression data that maybe don’t completely follow the assumption of equation (1) from our manuscript. From our experience, it seems in practice that even when there was such a warning message the proportions were predicted well, it is maybe that the optimization just wants to be too precise, or maybe few of the signature genes didn’t match well but the rest of signature genes could be used to have a good estimate of the proportions.
If you have some samples that seem to have strange results, it could however be useful to check that the issue is not that these samples didn’t converge well. To be more conservative you could also remove all the samples that didn’t converge well as these are maybe outliers, if it is only a small fraction from your original samples. Another possibility would be to change the parameters of the optim/constrOptim function to allow for more iterations or maybe a weaker tolerance for the convergence, but for this you would need to tweak it directly in the code of EPIC, I didn’t implement such option for EPIC.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.