metric_sample_filter: Metric-based Sample Filtering: Function to filter single-cell...

View source: R/sample_filtering.R

metric_sample_filterR Documentation

Metric-based Sample Filtering: Function to filter single-cell RNA-Seq libraries.

Description

This function returns a sample-filtering report for each cell in the input expression matrix, describing which filtering criteria are satisfied.

Usage

metric_sample_filter(
  expr,
  nreads = colSums(expr),
  ralign = NULL,
  gene_filter = NULL,
  pos_controls = NULL,
  scale. = FALSE,
  glen = NULL,
  AUC_range = c(0, 15),
  zcut = 1,
  mixture = TRUE,
  dip_thresh = 0.05,
  hard_nreads = 25000,
  hard_ralign = 15,
  hard_breadth = 0.2,
  hard_auc = 10,
  suff_nreads = NULL,
  suff_ralign = NULL,
  suff_breadth = NULL,
  suff_auc = NULL,
  plot = FALSE,
  hist_breaks = 10,
  ...
)

Arguments

expr

matrix The data matrix (genes in rows, cells in columns).

nreads

A numeric vector representing number of reads in each library. Default to 'colSums' of 'expr'.

ralign

A numeric vector representing the proportion of reads aligned to the reference genome in each library. If NULL, filtered_ralign will be returned NA.

gene_filter

A logical vector indexing genes that will be used to compute library transcriptome breadth. If NULL, filtered_breadth will be returned NA.

pos_controls

A logical, numeric, or character vector indicating positive control genes that will be used to compute false-negative rate characteristics. If NULL, filtered_fnr will be returned NA.

scale.

logical. Will expression be scaled by total expression for FNR computation? Default = FALSE

glen

Gene lengths for gene-length normalization (normalized data used in FNR computation).

AUC_range

An array of two values, representing range over which FNR AUC will be computed (log(expr_units)). Default c(0,15)

zcut

A numeric value determining threshold Z-score for sd, mad, and mixture sub-criteria. Default 1. If NULL, only hard threshold sub-criteria will be applied.

mixture

A logical value determining whether mixture modeling sub-criterion will be applied per primary criterion (metric). If true, a dip test will be applied to each metric. If a metric is multimodal, it is fit to a two-component normal mixture model. Samples deviating zcut sd's from optimal mean (in the inferior direction), have failed this sub-criterion.

dip_thresh

A numeric value determining dip test p-value threshold. Default 0.05.

hard_nreads

numeric. Hard (lower bound on) nreads threshold. Default 25000.

hard_ralign

numeric. Hard (lower bound on) ralign threshold. Default 15.

hard_breadth

numeric. Hard (lower bound on) breadth threshold. Default 0.2.

hard_auc

numeric. Hard (upper bound on) fnr auc threshold. Default 10.

suff_nreads

numeric. If not null, serves as an overriding upper bound on nreads threshold.

suff_ralign

numeric. If not null, serves as an overriding upper bound on ralign threshold.

suff_breadth

numeric. If not null, serves as an overriding upper bound on breadth threshold.

suff_auc

numeric. If not null, serves as an overriding lower bound on fnr auc threshold.

plot

logical. Should a plot be produced?

hist_breaks

hist() breaks argument. Ignored if 'plot=FALSE'.

...

Arguments to be passed to methods.

Details

For each primary criterion (metric), a sample is evaluated based on 4 sub-criteria: 1) Hard (encoded) threshold 2) Adaptive thresholding via sd's from the mean 3) Adaptive thresholding via mad's from the median 4) Adaptive thresholding via sd's from the mean (after mixture modeling) A sample must pass all sub-criteria to pass the primary criterion.

Value

A list with the following elements:

  • filtered_nreads Logical. Sample has too few reads.

  • filtered_ralign Logical. Sample has too few reads aligned.

  • filtered_breadth Logical. Samples has too few genes detected (low breadth).

  • filtered_fnr Logical. Sample has a high FNR AUC.

Examples

mat <- matrix(rpois(1000, lambda = 5), ncol=10)
colnames(mat) <- paste("X", 1:ncol(mat), sep="")
qc = as.matrix(cbind(colSums(mat),colSums(mat > 0)))
rownames(qc) = colnames(mat)
colnames(qc) = c("NCOUNTS","NGENES")
mfilt = metric_sample_filter(expr = mat,nreads = qc[,"NCOUNTS"],
   plot = TRUE, hard_nreads = 0)


YosefLab/scone documentation built on March 12, 2024, 10:48 p.m.