getTopHVGs: Identify HVGs

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/getTopHVGs.R

Description

Define a set of highly variable genes, based on variance modelling statistics from modelGeneVar or related functions.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
getTopHVGs(
  stats,
  var.field = "bio",
  n = NULL,
  prop = NULL,
  var.threshold = 0,
  fdr.field = "FDR",
  fdr.threshold = NULL,
  row.names = !is.null(rownames(stats))
)

Arguments

stats

A DataFrame of variance modelling statistics with one row per gene.

var.field

String specifying the column of stats containing the relevant metric of variation.

n

Integer scalar specifying the number of top HVGs to report.

prop

Numeric scalar specifying the proportion of genes to report as HVGs.

var.threshold

Numeric scalar specifying the minimum threshold on the metric of variation.

fdr.field

String specifying the column of stats containing the adjusted p-values. If NULL, no filtering is performed on the FDR.

fdr.threshold

Numeric scalar specifying the FDR threshold.

row.names

Logical scalar indicating whether row names should be reported.

Details

This function will identify all genes where the relevant metric of variation is greater than var.threshold. By default, this means that we retain all genes with positive values in the var.field column of stats. If var.threshold=NULL, the minimum threshold on the value of the metric is not applied.

If fdr.threshold is specified, we further subset to genes that have FDR less than or equal to fdr.threshold. By default, FDR thresholding is turned off as modelGeneVar and related functions determine significance of large variances relative to other genes. This can be overly conservative if many genes are highly variable.

If n=NULL and prop=NULL, the resulting subset of genes is directly returned. Otherwise, the top set of genes with the largest values of the variance metric are returned, where the size of the set is defined as the larger of n and prop*nrow(stats).

Value

A character vector containing the names of the most variable genes, if row.names=TRUE.

Otherwise, an integer vector specifying the indices of stats containing the most variable genes.

Author(s)

Aaron Lun

See Also

modelGeneVar and friends, to generate stats.

modelGeneCV2 and friends, to also generate stats.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
library(scuttle)
sce <- mockSCE()
sce <- logNormCounts(sce)

stats <- modelGeneVar(sce)
str(getTopHVGs(stats))
str(getTopHVGs(stats, fdr.threshold=0.05)) # more stringent

stats2 <- modelGeneCV2(sce)
str(getTopHVGs(stats2, var.field="ratio"))

scran documentation built on April 17, 2021, 6:09 p.m.