devianceFeatureSelection: Feature selection by approximate multinomial deviance

devianceFeatureSelectionR Documentation

Feature selection by approximate multinomial deviance

Description

Computes a deviance statistic for each row feature (such as a gene) for count data based on a multinomial null model that assumes each feature has a constant rate. Features with large deviance are likely to be informative. Uninformative, low deviance features can be discarded to speed up downstream analyses and reduce memory footprint.

Usage

devianceFeatureSelection(object, ...)

## S4 method for signature 'SummarizedExperiment'
devianceFeatureSelection(
  object,
  assay = "counts",
  fam = c("binomial", "poisson"),
  batch = NULL,
  nkeep = NULL,
  sorted = FALSE
)

## S4 method for signature 'matrix'
devianceFeatureSelection(object, fam = c("binomial", "poisson"), batch = NULL)

## S4 method for signature 'Matrix'
devianceFeatureSelection(object, fam = c("binomial", "poisson"), batch = NULL)

## S4 method for signature 'DelayedArray'
devianceFeatureSelection(object, fam = c("binomial", "poisson"), batch = NULL)

Arguments

object

an object inheriting from SummarizedExperiment (such as SingleCellExperiment). Alternatively, a matrix or matrix-like object (such as a sparse Matrix) of non-negative integer counts.

...

for the generic, additional arguments to pass to object-specific methods.

assay

a string or integer specifying which assay contains the count data (default = 'counts'). Ignored if object is a matrix-like object.

fam

a string specifying the model type to be used for calculating the residuals. Binomial (the default) is the closest approximation to multinomial, but Poisson may be faster to compute and often is very similar to binomial.

batch

an optional factor indicating batch membership of observations. If provided, the null model is computed within each batch separately to regress out the batch effect from the resulting deviance statistics.

nkeep

integer, how many informative features should be retained? Default: all features are retained if set to NULL. Ignored if object is a matrix-like object.

sorted

logical, should the object be returned with rows sorted in decreasing order of deviance? Default: FALSE, unless nkeep is specified, in which case it is forced to be TRUE. Ignored for matrix-like inputs.

Details

In a typical single-cell analysis, many of the features (genes) may not be informative about differences between observations (cells). Feature selection seeks to identify which genes are the most informative. We define an informative gene as one that is poorly fit by a multinomial model of constant expression across cells within each batch. We compute a deviance statistic for each gene. Genes with high deviance are more informative.

Value

The original SingleCellExperiment or SummarizedExperiment object with the deviance statistics for each feature appended to the rowData. The new column name will be either binomial_deviance or poisson_deviance. If the input was a matrix-like object, output is a numeric vector containing the deviance statistics for each row.

References

Townes FW, Hicks SC, Aryee MJ, and Irizarry RA (2019). Feature Selection and Dimension Reduction for Single Cell RNA-Seq based on a Multinomial Model. Genome Biology https://doi.org/10.1186/s13059-019-1861-6

Examples

ncells <- 100
u <- matrix(rpois(20000, 5), ncol=ncells)
sce <- SingleCellExperiment::SingleCellExperiment(assays=list(counts=u))
devianceFeatureSelection(sce)


kstreet13/scry documentation built on July 13, 2024, 8:32 p.m.