findVariableGenes: Find variable genes
In farrellja/URD: URD

Description Usage Arguments Details Value References Examples

Single-cell RNAseq data is noisy, so we perform our analyses using only those genes that exhibit greater variability than those of similar expression levels. In theory, those genes have biological variability across cells in addition to their technical variability.

findVariableGenes(
  object,
  cells.fit = NULL,
  set.object.var.genes = T,
  diffCV.cutoff = 0.5,
  mean.min = 0.005,
  mean.max = 100,
  main.use = "",
  do.plot = T
)

`object`	An URD object
`cells.fit`	(Character Vector) Cells to use for finding variable genes (if `NULL`, uses all cells.)
`set.object.var.genes`	(Logical) Return an object with `@var.genes` set (if `TRUE`) or return a character vector of variable genes (if `FALSE`)
`diffCV.cutoff`	(Numeric) Difference in coefficient of variation (CV) between null distribution & genes that will be considered variable (Difference is in log space, so this amounts to a fold-change)
`mean.min`	(Numeric) Genes must have this minimum mean expression to be selected (Use to eliminate noisy lowly expressed genes)
`mean.max`	(Numeric) Genes must have less than this maximum mean expression to be selected (Use to eliminate the high end if the null distribution fits very poorly there)
`main.use`	(Character) Title to display for the overall three-panel plot
`do.plot`	(Logical) Whether or not to display plots

A null mathematical model is built to model the relationship between average UMI counts and coefficient of variation (CV) across all genes, based on a negative binomial distribution that incorporates sampling noise and relative library size. Those genes that have a CV greater than the null model (threshold determined by diffCV.cutoff) are chosen as variable.

If do.plot=T, produces three plots: the first shows the relative library sizes and the gamma distribution fit to them. The second shows a histogram of each gene's CV ratio to the null for its mean expression level and the diffCV.cutoff threshold chosen. The third shows each gene's mean expression and CV, the determined null model (in pink), and whether the gene was selected as variable (green genes were variable).

Either an URD object with @var.genes set (if set.object.var.genes=T) or character vector of variable genes (if set.object.var.genes=F)

Pandey S, Shekhar K, Regev A, and Schier AF. Comprehensive Identification and Spatial Mapping of Habenular Neuronal Types Using Single-Cell RNA-Seq. 2018. Current Biology 28(7):1052-1065. DOI: https://doi.org/10.1016/j.cub.2018.02.040

# Find a list of cells from each stage.
stages <- unique(object@meta$STAGE)
cells.each.stage <- lapply(stages, function(stage) rownames(object@meta)[which(object@meta$STAGE ==                                                                                 stage)])
# Compute variable genes for each stage.
var.genes.by.stage <- lapply(1:length(stages), function(n) findVariableGenes(object, cells.fit = cells.each.stage[[n]], set.object.var.genes = F, diffCV.cutoff = 0.3, mean.min = 0.005, mean.max = 100, main.use = stages[[n]], do.plot = T))