check_outlier: Check for outlier clusters

View source: R/human_functions.R

check_outlierR Documentation

Check for outlier clusters

Description

This function checks for outliers looks for unexpected combinations of marker gene expression (e.g., GAD1 + SLC17A7) and for particularly high or low expression of indicated QC metrics, and flags any of the clusters meeting those criteria as potential outliers. This should (in theory) find things like poor quality clusters and clusters of doublets. Specific genes and thresholds currnetly hard-coded in, but might be updated in later iterations.

Usage

check_outlier(
  anno,
  cluster,
  norm.dat,
  select.cells = colnames(norm.dat),
  keep.cl = NULL,
  neun.thresh = 0.5,
  neun.colname = "facs_population_plan",
  neun.val = "NeuN-pos",
  qc.metrics = c("Genes.Detected.CPM", "percent_reads_aligned_total", "complexity_cg"),
  test.genes = c("SNAP25", "GAD1", "GAD2", "SLC17A7", "SLC17A6"),
  expr.th = 3,
  prop.th = c(0.4, 0.4, 0.4, 0.4, 0.4),
  min.prop.th = 0.8,
  plot = TRUE,
  plot.path = "output/"
)

Arguments

anno

anno dataframe which must include column names listed in 'neun.colname' and 'qc.metrics'. "cluster" is added from 'cluster' parameter below.

cluster

cluster labels for all cells along with sample_id as their names

norm.dat

expression dataframe with columns as cells and rows as gene names and cpm normalized

select.cells

column nmaes of norm.dat

keep.cl

clusters to definitely keep in analysis (e.g., to exclude from consideration as an outlier cluster) default is NULL

neun.thresh

fraction of cells expressing NeuN to be considered NeuN positive (default is 0.5)

neun.colname

column name in anno with the Nuen information

neun.val

value corresponding to non-neuronal marker in neun.colname in anno

qc.metrics

required columns from anno dataframe. default is Genes.Detected.CPM", "percent_reads_aligned_total", "complexity_cg"

test.genes

CURRENTLY NOT USED. This function will eventually allow for a pre-defined set of genes to be entered. default is "SNAP25", "GAD1", "GAD2", "SLC17A7", "SLC17A6"

expr.th

expression threshold for detecting test genes

prop.th

proportion threshold of detected genes by cluster (default is 0.4, 0.4, 0.4, 0.4, 0.4)

min.prop.th

one of last 4 test genes should have detection at least at this amount

plot

default is TRUE

plot.path

path of plot, default is ./output/

Value

gives outlier clusters and exploratory plots


AllenInstitute/scrattch.hicat documentation built on June 6, 2024, 5:31 a.m.