perFeatureQCMetrics: Per-feature quality control metrics
In scuttle: Single-Cell RNA-Seq Analysis Utilities

Description Usage Arguments Details Value Author(s) See Also Examples

Compute per-feature quality control metrics for a count matrix or a SummarizedExperiment.

perFeatureQCMetrics(x, ...)

## S4 method for signature 'ANY'
perFeatureQCMetrics(
  x,
  subsets = NULL,
  threshold = 0,
  BPPARAM = SerialParam(),
  flatten = TRUE,
  detection_limit = NULL
)

## S4 method for signature 'SummarizedExperiment'
perFeatureQCMetrics(x, ..., assay.type = "counts", exprs_values = NULL)

`x`	A numeric matrix of counts with cells in columns and features in rows. Alternatively, a SummarizedExperiment or SingleCellExperiment object containing such a matrix.
`...`	For the generic, further arguments to pass to specific methods. For the SummarizedExperiment and SingleCellExperiment methods, further arguments to pass to the ANY method.
`subsets`	A named list containing one or more vectors (a character vector of cell names, a logical vector, or a numeric vector of indices), used to identify interesting sample subsets such as negative control wells.
`threshold`	A numeric scalar specifying the threshold above which a gene is considered to be detected.
`BPPARAM`	A BiocParallelParam object specifying how parallelization should be performed.
`flatten`	Logical scalar indicating whether the nested DataFrames in the output should be flattened.
`detection_limit, exprs_values`	Soft deprecated equivalents to the arguments described above.
`assay.type`	A string or integer scalar indicating which `assays` in the `x` contains the count matrix.

This function calculates useful QC metrics for features, including the mean across all cells and the number of expressed features (i.e., counts above the detection limit).

If subsets is specified, the same statistics are computed for each subset of cells. This is useful for obtaining statistics for cell sets of interest, e.g., negative control wells. These statistics are stored as nested DataFrames in the output. For example, if subsets contained "empty" and "cellpool", the output would look like:

  output 
  |-- mean 
  |-- detected
  +-- subsets
      |-- empty
      |   |-- mean 
      |   |-- detected
      |   +-- ratio
      +-- cellpool 
          |-- mean
          |-- detected
          +-- ratio

The ratio field contains the ratio of the mean within each subset to the mean across all cells.

If flatten=TRUE, the nested DataFrames are flattened by concatenating the column names with underscores. This means that, say, the subsets$empty$mean nested field becomes the top-level subsets_empty_mean field. A flattened structure is more convenient for end-users performing interactive analyses, but less convenient for programmatic access as artificial construction of strings is required.

A DataFrame of QC statistics where each row corresponds to a row in x. This contains the following fields:

mean: numeric, the mean counts for each feature.
detected: numeric, the percentage of observations above threshold.

If flatten=FALSE, the output DataFrame also contains the subsets field. This a nested DataFrame containing per-feature QC statistics for each subset of columns.

If flatten=TRUE, subsets is flattened to remove the hierarchical structure.

Aaron Lun

addPerFeatureQC, to add the QC metrics to the row metadata.

example_sce <- mockSCE()
stats <- perFeatureQCMetrics(example_sce)
stats

# With subsets.
stats2 <- perFeatureQCMetrics(example_sce, subsets=list(Empty=1:10))
stats2