deconvolute: Deconvolute bulk RNA-Seq using single-cell RNA-Seq signature
In cellGeometry: Geometric Single Cell Deconvolution

deconvolute

R Documentation

Deconvolute bulk RNA-Seq using single-cell RNA-Seq signature

Description

Deconvolution of bulk RNA-Seq using vector projection method with adjustable compensation for spillover.

Usage

deconvolute(
  mk,
  test,
  logged_bulk = FALSE,
  count_space = TRUE,
  comp_amount = 1,
  group_comp_amount = 0,
  weights = NULL,
  weight_method = "equal",
  adjust_comp = TRUE,
  use_filter = TRUE,
  arith_mean = FALSE,
  convert_bulk = FALSE,
  lambda = NULL,
  cv_lambda = FALSE,
  nfolds = 10,
  check_comp = FALSE,
  npass = 1,
  outlier_method = c("var.e", "cooks", "rstudent"),
  outlier_cutoff = switch(outlier_method, var.e = 4, cooks = 1, rstudent = 10),
  outlier_quantile = 0.9,
  verbose = TRUE
)

Arguments

`mk`	object of class 'cellMarkers'. See `cellMarkers()`.
`test`	matrix of bulk RNA-Seq to be deconvoluted with genes in rows and samples in columns. We recommend raw counts as input, but normalised data can be provided, in which case set `logged_bulk = TRUE`.
`logged_bulk`	Logical, whether log2 transformed bulk RNA-Seq data is used as input in `test`.
`count_space`	Logical, whether deconvolution is performed in count space (as opposed to log2 space). Signature and test revert to count scale by 2^ exponentiation during deconvolution.
`comp_amount`	either a single value from 0-1 for the amount of compensation or a numeric vector with the same length as the number of cell subclasses to deconvolute.
`group_comp_amount`	either a single value from 0-1 for the amount of compensation for cell group analysis or a numeric vector with the same length as the number of cell groups to deconvolute.
`weights`	Optional vector of weights which affects how much each gene in the gene signature matrix affects the deconvolution.
`weight_method`	Optional. Choices include "none" or "equal" in which gene weights are calculated so that each gene has equal weighting in the vector projection; "equal" overrules any vector supplied by `weights`.
`adjust_comp`	logical, whether to optimise `comp_amount` to prevent negative cell proportion projections.
`use_filter`	logical, whether to use denoised signature matrix.
`arith_mean`	logical, whether to use arithmetic means (if available) for signature matrix. Mainly useful with pseudo-bulk simulation.
`convert_bulk`	either "ref" to convert bulk RNA-Seq to scRNA-Seq scaling using reference data or "qqmap" using quantile mapping of the bulk to scRNA-Seq datasets, or "none" (or `FALSE`) for no conversion.
`lambda`	single numeric value of ridge parameter lambda. Only applied to subclass deconvolution, not applied to cell group analysis. If `cv_lambda = TRUE` then a sequence of lambda values can be supplied.
`cv_lambda`	logical, whether to tune lambda using cross-validation (experimental). If `lambda` is not supplied, a default sequence is used.
`nfolds`	Number of folds for cross-validation of lambda.
`check_comp`	logical, whether to analyse compensation values across subclasses. See `plot_comp()`.
`npass`	Number of passes. If `npass` set to 2 or more this activates removal of genes with excess variance of the residuals.
`outlier_method`	Method for identifying outlying genes. Options are to use the variance of the residuals for each genes, Cook's distance or absolute Studentized residuals (see details).
`outlier_cutoff`	Cutoff for removing genes which are outliers based on method selected by `outlier_method`.
`outlier_quantile`	Controls quantile for the cutoff for identifying outliers for `outlier_method = "cook"` or `"rstudent"`.
`verbose`	logical, whether to show messages.

Details

Equal weighting of genes by setting weight_method = "equal" can help devolution of subclusters whose signature genes have low expression. It is enabled by default.

If a normalised (i.e. logged) bulk matrix is provided instead of raw counts, then it is important that zero expression is true zero. For this reason we do not recommend use of VST (variance stabilised transformed counts) which has a variable offset.

Multipass deconvolution can be activated by setting npass to 2 or higher. This is designed to remove genes which behave inconsistently due to noise in either the sc or bulk datasets, which is increasingly likely if you have larger signature geneset, i.e. if nsubclass is large. Or you may receive a warning message "Detected genes with extreme residuals". Three methods are available for identifying outlier genes (i.e. whose residuals are too noisy) controlled by outlier_method:

var.e, this calculates the variance of the residuals across samples for each gene. Genes whose variance of residuals are outliers based on Z-score standardisation are removed during successive passes.
cooks, this considers the deconvolution as if it were a regression and applies Cook's distance to the residuals and the hat matrix. This seems to be the most stringent method (removes fewest genes).
rstudent, externally Studentized residuals are used.

The cutoff specified by outlier_cutoff which is used to determine which genes are outliers is very sensitive to the outlier method. With var.e the variances are Z-score scaled. With Cook's distance it is typical to consider a value of >1 as fairly strong indication of an outlier, while 0.5 is considered a possible outlier. With Studentized residuals, these are expected to be on a t distribution scale. However, since gene expression itself does not derive from a normal distribution, the errors and residuals are not normally distributed either, which probably explains the need for a very high cut-off. In practice the choice of settings seems to be dataset dependent.

The ridge parameter lambda, which adds L2 regularisation to the compensation (moment) matrix, is provided for experimental purposes. Lambda can either be set manually, or cross-validation of lambda is performed by splitting signature genes into folds and performing CV based on minimising sum of squared residual gene expression error. The CV curve can be plotted using plot_cv().

In simple simulations, altering lambda or using CV of lambda provides a small benefit, but it is almost always outweighed by varying other parameters especially increasing nsubclass. Where CV of lambda appears to be more useful is in simulations where different types of noise is added. Here, CV of lambda provides benefit against added noise, especially in large datasets with many subclasses, or if collinearity (similarity) is high between cell subclasses. It has not been tested extensively on real world bulk data.

Value

A list object of S3 class 'deconv' containing:

`call`	the matched call
`mk`	the original 'cellMarkers' class object
`subclass`	list object containing: `output`, the amount of each subclass based purely on project gene expression `percent`, the proportion of each subclass scaled as a percentage so that the total amount across all subclasses adds to 100% `spillover`, the spillover matrix `compensation`, the mixed final compensation matrix which incorporates `comp_amount` `rawcomp`, the original unadjusted compensation matrix `comp_amount`, the final values for the amount of compensation across each cell subclass after adjustment to prevent negative values `X`, the (weighted) model matrix `residuals`, residuals, that is gene expression minus fitted values `var.e`, variance of weighted residuals for each gene `weights`, vector of weights `resvar`, `s^2` the estimate of the gene expression variance for each sample `removed`, optional vector of outlying genes removed during successive passes `cv`, optional list included when `cv_lambda = TRUE`, containing lambda CV results
`group`	similar list object to `subclass`, but with results for the cell group analysis.
`nest_output`	alternative matrix of cell output results for each subclass adjusted so that the cell outputs across subclasses are nested as a proportion of cell group outputs.
`nest_percent`	alternative matrix of cell proportion results for each subclass adjusted so that the percentages across subclasses are nested within cell group percentages. The total percentage still adds to 100%.
`comp_amount`	original argument `comp_amount`
`opt`	list of original arguments
`comp_check`	optional list element returned when `check_comp = TRUE`

Author(s)

Myles Lewis

cellGeometry
Geometric Single Cell Deconvolution

deconvolute: Deconvolute bulk RNA-Seq using single-cell RNA-Seq signature
In cellGeometry: Geometric Single Cell Deconvolution

Deconvolute bulk RNA-Seq using single-cell RNA-Seq signature

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to deconvolute in cellGeometry...

R Package Documentation

Browse R Packages

We want your feedback!

cellGeometry Geometric Single Cell Deconvolution

deconvolute: Deconvolute bulk RNA-Seq using single-cell RNA-Seq signature In cellGeometry: Geometric Single Cell Deconvolution

Deconvolute bulk RNA-Seq using single-cell RNA-Seq signature

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to deconvolute in cellGeometry...

R Package Documentation

Browse R Packages

We want your feedback!

cellGeometry
Geometric Single Cell Deconvolution

deconvolute: Deconvolute bulk RNA-Seq using single-cell RNA-Seq signature
In cellGeometry: Geometric Single Cell Deconvolution