devianceFeatureSelection | R Documentation |
Computes a deviance statistic for each row feature (such as a gene) for count data based on a multinomial null model that assumes each feature has a constant rate. Features with large deviance are likely to be informative. Uninformative, low deviance features can be discarded to speed up downstream analyses and reduce memory footprint.
devianceFeatureSelection(object, ...)
## S4 method for signature 'SummarizedExperiment'
devianceFeatureSelection(
object,
assay = "counts",
fam = c("binomial", "poisson"),
batch = NULL,
nkeep = NULL,
sorted = FALSE
)
## S4 method for signature 'matrix'
devianceFeatureSelection(object, fam = c("binomial", "poisson"), batch = NULL)
## S4 method for signature 'Matrix'
devianceFeatureSelection(object, fam = c("binomial", "poisson"), batch = NULL)
## S4 method for signature 'DelayedArray'
devianceFeatureSelection(object, fam = c("binomial", "poisson"), batch = NULL)
object |
an object inheriting from |
... |
for the generic, additional arguments to pass to object-specific methods. |
assay |
a string or integer specifying which assay contains the count
data (default = 'counts'). Ignored if |
fam |
a string specifying the model type to be used for calculating the residuals. Binomial (the default) is the closest approximation to multinomial, but Poisson may be faster to compute and often is very similar to binomial. |
batch |
an optional factor indicating batch membership of observations. If provided, the null model is computed within each batch separately to regress out the batch effect from the resulting deviance statistics. |
nkeep |
integer, how many informative features should be retained?
Default: all features are retained if set to NULL. Ignored if |
sorted |
logical, should the |
In a typical single-cell analysis, many of the features (genes) may not be informative about differences between observations (cells). Feature selection seeks to identify which genes are the most informative. We define an informative gene as one that is poorly fit by a multinomial model of constant expression across cells within each batch. We compute a deviance statistic for each gene. Genes with high deviance are more informative.
The original SingleCellExperiment
or
SummarizedExperiment
object with the deviance statistics for each
feature appended to the rowData. The new column name will be either
binomial_deviance or poisson_deviance. If the input was a matrix-like
object, output is a numeric vector containing the deviance statistics for
each row.
Townes FW, Hicks SC, Aryee MJ, and Irizarry RA (2019). Feature Selection and Dimension Reduction for Single Cell RNA-Seq based on a Multinomial Model. Genome Biology https://doi.org/10.1186/s13059-019-1861-6
ncells <- 100
u <- matrix(rpois(20000, 5), ncol=ncells)
sce <- SingleCellExperiment::SingleCellExperiment(assays=list(counts=u))
devianceFeatureSelection(sce)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.