getTopHVGs: Identify HVGs
In MarioniLab/scran: Methods for Single-Cell RNA-Seq Data Analysis

getTopHVGs

R Documentation

Identify HVGs

Description

Define a set of highly variable genes, based on variance modelling statistics from modelGeneVar or related functions.

Usage

getTopHVGs(
  stats,
  var.field = "bio",
  n = NULL,
  prop = NULL,
  var.threshold = 0,
  fdr.field = "FDR",
  fdr.threshold = NULL,
  row.names = !is.null(rownames(stats))
)

Arguments

`stats`	A DataFrame of variance modelling statistics with one row per gene. Alternatively, a SummarizedExperiment object, in which case it is supplied to `modelGeneVar` to generate the required DataFrame.
`var.field`	String specifying the column of `stats` containing the relevant metric of variation.
`n`	Integer scalar specifying the number of top HVGs to report.
`prop`	Numeric scalar specifying the proportion of genes to report as HVGs.
`var.threshold`	Numeric scalar specifying the minimum threshold on the metric of variation.
`fdr.field`	String specifying the column of `stats` containing the adjusted p-values. If `NULL`, no filtering is performed on the FDR.
`fdr.threshold`	Numeric scalar specifying the FDR threshold.
`row.names`	Logical scalar indicating whether row names should be reported.

Details

This function will identify all genes where the relevant metric of variation is greater than var.threshold. By default, this means that we retain all genes with positive values in the var.field column of stats. If var.threshold=NULL, the minimum threshold on the value of the metric is not applied.

If fdr.threshold is specified, we further subset to genes that have FDR less than or equal to fdr.threshold. By default, FDR thresholding is turned off as modelGeneVar and related functions determine significance of large variances relative to other genes. This can be overly conservative if many genes are highly variable.

If n=NULL and prop=NULL, the resulting subset of genes is directly returned. Otherwise, the top set of genes with the largest values of the variance metric are returned, where the size of the set is defined as the larger of n and prop*nrow(stats).

Value

A character vector containing the names of the most variable genes, if row.names=TRUE.

Otherwise, an integer vector specifying the indices of stats containing the most variable genes.

Author(s)

Aaron Lun

Examples

library(scuttle)
sce <- mockSCE()
sce <- logNormCounts(sce)

stats <- modelGeneVar(sce)
str(getTopHVGs(stats))
str(getTopHVGs(stats, fdr.threshold=0.05)) # more stringent

# Or directly pass in the SingleCellExperiment:
str(getTopHVGs(sce))

# Alternatively, use with the coefficient of variation:
stats2 <- modelGeneCV2(sce)
str(getTopHVGs(stats2, var.field="ratio"))

MarioniLab/scran documentation built on Sept. 7, 2024, 6:25 a.m.