EPIC | R Documentation |
EPIC
takes as input bulk gene expression data (RNA-seq) and returns
the proportion of mRNA and cells composing the various samples.
EPIC(
bulk,
reference = NULL,
mRNA_cell = NULL,
mRNA_cell_sub = NULL,
sigGenes = NULL,
scaleExprs = TRUE,
withOtherCells = TRUE,
constrainedSum = TRUE,
rangeBasedOptim = FALSE
)
bulk |
A matrix ( |
reference |
(optional): A string or a list defining the reference cells. It can take multiple formats, either:
|
mRNA_cell |
(optional): A named numeric vector: tells (in arbitrary units) the amount of mRNA for each of the reference cells and of the other uncharacterized (cancer) cell. Two names are of special meaning: "otherCells" - used for the mRNA/cell value of the "other cells" from the sample (i.e. the cell type that don't have any reference gene expression profile) ; and default - used for the mRNA/cell of the cells from the reference profiles for which no specific value is given in mRNA_cell (i.e. if mRNA_cell=c(Bcells=2, NKcells=2.1, otherCells=3.5, default=1), then if the refProfiles described Bcells, NKcells and Tcells, we would use a value of 3.5 for the "otherCells" that didn't have any reference profile and a default value of 1 for the Tcells when computing the cell fractions). To note: if data is in tpm, this mRNA per cell would ideally correspond to some number of transcripts per cell. |
mRNA_cell_sub |
(optional): This can be given instead of |
sigGenes |
(optional): a character vector of the gene names to use as signature for the deconvolution. In principle this is given with the reference as the "reference$sigGenes" but if we give a value for this input variable, it is these signature genes that will be used instead of the ones given with the reference profile. |
scaleExprs |
(optional, default is TRUE): boolean telling if the bulk samples and reference gene expression profiles should be rescaled based on the list of genes in common between the them (such a rescaling is recommanded). |
withOtherCells |
(optional, default is TRUE): if EPIC should allow for an additional cell type for which no gene expression reference profile is available or if the bulk is assumed to be composed only of the cells with reference profiles. |
constrainedSum |
(optional, default is TRUE): tells if the sum of all
cell types should be constrained to be < 1. When
|
rangeBasedOptim |
(optional): when this is FALSE (the default), the least square optimization is performed as described in Racle et al., 2017, eLife, which is recommanded. When this variable is TRUE, EPIC uses the variability of each gene from the reference profiles in another way: instead of defining weights (based on the variability) for the fit of each gene, we define a range of values accessible for each gene (based on the gene expression value in the reference profile +/- the variability values). The error that the optimization tries to minimize is by how much the predicted gene expression is outside of this allowed range of values. |
This function uses a constrained least square minimization to estimate the proportion of each cell type with a reference profile and another uncharacterized cell type in bulk gene expression samples.
The names of the genes in the bulk samples, the reference samples and in the gene signature list need to be the same format (gene symbols are used in the predefined reference profiles). The full list of gene names don't need to be exactly the same between the reference and bulk samples: EPIC will use the intersection of the genes. In case of duplicate gene names, EPIC will use the median value per duplicate - if you want to consider these cases differently, you can remove the duplicates before calling EPIC.
A list of 3 matrices:
mRNAProportions
(nSamples
x (nCellTypes+1
)) the
proportion of mRNA coming from all cell types with a ref profile + the
uncharacterized other cell. Please note that if working with reconstructed
in silico bulk samples built for example from single-cell RNA-seq data,
then you should compare the 'true' proportions against these
'mRNAProportions', while if working with true bulk samples, then you should
compare the cell proportions against the 'cellFractions'.
cellFractions
(nSamples
x (nCellTypes+1
)) this
gives the proportion of cells from each cell type after accounting for
the mRNA / cell value.
fit.gof
(nSamples
x 12) a matrix telling the quality
for the fit of the signature genes in each sample. It tells if the
minimization converged, and other info about this fit comparing the
measured gene expression in the sigGenes vs predicted gene expression in
the sigGenes.
res1 <- EPIC(melanoma_data$counts)
res1$cellFractions
res2 <- EPIC(melanoma_data$counts, TRef)
res3 <- EPIC(bulk=melanoma_data$counts, reference=TRef)
res4 <- EPIC(melanoma_data$counts, reference="TRef")
res5 <- EPIC(melanoma_data$counts, mRNA_cell_sub=c(Bcells=1, otherCells=5))
# Various possible ways of calling EPIC function. res 1 to 4 should
# give exactly the same outputs, and the elements res1$cellFractions
# should be equal to the example predictions found in
# melanoma_data$cellFractions.pred for these first 4 results.
# The values of cellFraction for res5 will be different due to the use of
# other mRNA per cell values for the B and other cells.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.