perCellQCFilters: Compute filters for low-quality cells

View source: R/perCellQCFilters.R

perCellQCFiltersR Documentation

Compute filters for low-quality cells

Description

Identifies low-quality cells as outliers for frequently used QC metrics.

Usage

perCellQCFilters(
  x,
  sum.field = "sum",
  detected.field = "detected",
  sub.fields = NULL,
  ...
)

Arguments

x

A DataFrame containing per-cell QC statistics, as computed by perCellQCMetrics.

sum.field

String specifying the column of x containing the library size for each cell.

detected.field

String specifying the column of x containing the number of detected features per cell.

sub.fields

Character vector specifying the column(s) of x containing the percentage of counts in subsets of “control features”, usually mitochondrial genes or spike-in transcripts.

If set to TRUE, this will default to all columns in x with names following the patterns "subsets_.*_percent" and "altexps_.*_percent".

...

Further arguments to pass to isOutlier.

Details

This function simply calls isOutlier on the various QC metrics in x.

  • For sum.field, small outliers are detected. These are considered to represent low-quality cells that have not been insufficiently sequenced. Detection is performed on the log-scale to adjust for a heavy right tail and to improve resolution at zero.

  • For detected.field, small outliers are detected. These are considered to represent low-quality cells with low-complexity libraries. Detection is performed on the log-scale to adjust for a heavy right tail. This is done on the log-scale to adjust for a heavy right tail and to improve resolution at zero.

  • For each column specified by sub.fields, large outliers are detected. This aims to remove cells with high spike-in or mitochondrial content, usually corresponding to damaged cells. While these distributions often have heavy right tails, the putative low-quality cells are often present in this tail; thus, transformation is not performed to ensure maintain resolution of the filter.

Users can control the outlier detection (e.g., change the number of MADs, specify batches) by passing appropriate arguments to ....

Value

A DataFrame with one row per cell and containing columns of logical vectors. Each column specifies a reason for why a cell was considered to be low quality, with the final discard column indicating whether the cell should be discarded.

Author(s)

Aaron Lun

See Also

perCellQCMetrics, for calculation of these metrics.

isOutlier, to identify outliers with a MAD-based approach.

Examples

example_sce <- mockSCE()
x <- perCellQCMetrics(example_sce, subsets=list(Mito=1:100))

discarded <- perCellQCFilters(x, 
    sub.fields=c("subsets_Mito_percent", "altexps_Spikes_percent"))
colSums(as.data.frame(discarded))


LTLA/scuttle documentation built on Oct. 28, 2024, 9:45 a.m.