| deconvolute | R Documentation |
Deconvolution of bulk RNA-Seq using vector projection method with adjustable compensation for spillover.
deconvolute(
mk,
test,
logged_bulk = FALSE,
count_space = TRUE,
comp_amount = 1,
group_comp_amount = 0,
weights = NULL,
weight_method = "equal",
adjust_comp = TRUE,
use_filter = TRUE,
arith_mean = FALSE,
convert_bulk = FALSE,
lambda = NULL,
cv_lambda = FALSE,
nfolds = 10,
check_comp = FALSE,
npass = 1,
outlier_method = c("var.e", "cooks", "rstudent"),
outlier_cutoff = switch(outlier_method, var.e = 4, cooks = 1, rstudent = 10),
outlier_quantile = 0.9,
verbose = TRUE
)
mk |
object of class 'cellMarkers'. See |
test |
matrix of bulk RNA-Seq to be deconvoluted with genes in rows and
samples in columns. We recommend raw counts as input, but normalised data
can be provided, in which case set |
logged_bulk |
Logical, whether log2 transformed bulk RNA-Seq data is
used as input in |
count_space |
Logical, whether deconvolution is performed in count space (as opposed to log2 space). Signature and test revert to count scale by 2^ exponentiation during deconvolution. |
comp_amount |
either a single value from 0-1 for the amount of compensation or a numeric vector with the same length as the number of cell subclasses to deconvolute. |
group_comp_amount |
either a single value from 0-1 for the amount of compensation for cell group analysis or a numeric vector with the same length as the number of cell groups to deconvolute. |
weights |
Optional vector of weights which affects how much each gene in the gene signature matrix affects the deconvolution. |
weight_method |
Optional. Choices include "none" or "equal" in which
gene weights are calculated so that each gene has equal weighting in the
vector projection; "equal" overrules any vector supplied by |
adjust_comp |
logical, whether to optimise |
use_filter |
logical, whether to use denoised signature matrix. |
arith_mean |
logical, whether to use arithmetic means (if available) for signature matrix. Mainly useful with pseudo-bulk simulation. |
convert_bulk |
either "ref" to convert bulk RNA-Seq to scRNA-Seq scaling
using reference data or "qqmap" using quantile mapping of the bulk to
scRNA-Seq datasets, or "none" (or |
lambda |
single numeric value of ridge parameter lambda. Only applied to
subclass deconvolution, not applied to cell group analysis. If
|
cv_lambda |
logical, whether to tune lambda using cross-validation
(experimental). If |
nfolds |
Number of folds for cross-validation of lambda. |
check_comp |
logical, whether to analyse compensation values across
subclasses. See |
npass |
Number of passes. If |
outlier_method |
Method for identifying outlying genes. Options are to use the variance of the residuals for each genes, Cook's distance or absolute Studentized residuals (see details). |
outlier_cutoff |
Cutoff for removing genes which are outliers based on
method selected by |
outlier_quantile |
Controls quantile for the cutoff for identifying
outliers for |
verbose |
logical, whether to show messages. |
Equal weighting of genes by setting weight_method = "equal" can help
devolution of subclusters whose signature genes have low expression. It is
enabled by default.
If a normalised (i.e. logged) bulk matrix is provided instead of raw counts, then it is important that zero expression is true zero. For this reason we do not recommend use of VST (variance stabilised transformed counts) which has a variable offset.
Multipass deconvolution can be activated by setting npass to 2 or higher.
This is designed to remove genes which behave inconsistently due to noise in
either the sc or bulk datasets, which is increasingly likely if you have
larger signature geneset, i.e. if nsubclass is large. Or you may receive a
warning message "Detected genes with extreme residuals". Three methods are
available for identifying outlier genes (i.e. whose residuals are too noisy)
controlled by outlier_method:
var.e, this calculates the variance of the residuals across samples for
each gene. Genes whose variance of residuals are outliers based on Z-score
standardisation are removed during successive passes.
cooks, this considers the deconvolution as if it were a regression and
applies Cook's distance to the residuals and the hat matrix. This seems to be
the most stringent method (removes fewest genes).
rstudent, externally Studentized residuals are used.
The cutoff specified by outlier_cutoff which is used to determine which
genes are outliers is very sensitive to the outlier method. With var.e the
variances are Z-score scaled. With Cook's distance it is typical to consider
a value of >1 as fairly strong indication of an outlier, while 0.5 is
considered a possible outlier. With Studentized residuals, these are expected
to be on a t distribution scale. However, since gene expression itself does
not derive from a normal distribution, the errors and residuals are not
normally distributed either, which probably explains the need for a very high
cut-off. In practice the choice of settings seems to be dataset dependent.
The ridge parameter lambda, which adds L2 regularisation to the compensation
(moment) matrix, is provided for experimental purposes. Lambda can either be
set manually, or cross-validation of lambda is performed by splitting
signature genes into folds and performing CV based on minimising sum of
squared residual gene expression error. The CV curve can be plotted using
plot_cv().
In simple simulations, altering lambda or using CV of lambda provides a small
benefit, but it is almost always outweighed by varying other parameters
especially increasing nsubclass. Where CV of lambda appears to be more
useful is in simulations where different types of noise is added. Here, CV of
lambda provides benefit against added noise, especially in large datasets
with many subclasses, or if collinearity (similarity) is high between cell
subclasses. It has not been tested extensively on real world bulk data.
A list object of S3 class 'deconv' containing:
call |
the matched call |
mk |
the original 'cellMarkers' class object |
subclass |
list object containing:
|
group |
similar list object to |
nest_output |
alternative matrix of cell output results for each subclass adjusted so that the cell outputs across subclasses are nested as a proportion of cell group outputs. |
nest_percent |
alternative matrix of cell proportion results for each subclass adjusted so that the percentages across subclasses are nested within cell group percentages. The total percentage still adds to 100%. |
comp_amount |
original argument |
opt |
list of original arguments |
comp_check |
optional list element returned when |
Myles Lewis
cellMarkers() updateMarkers() se() residuals.deconv()
rstudent.deconv() cooks.distance.deconv() kappa.deconv() plot_cv()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.