getTopHVGs | R Documentation |
Define a set of highly variable genes, based on variance modelling statistics
from modelGeneVar
or related functions.
getTopHVGs(
stats,
var.field = "bio",
n = NULL,
prop = NULL,
var.threshold = 0,
fdr.field = "FDR",
fdr.threshold = NULL,
row.names = !is.null(rownames(stats))
)
stats |
A DataFrame of variance modelling statistics with one row per gene.
Alternatively, a SummarizedExperiment object, in which case it is supplied to |
var.field |
String specifying the column of |
n |
Integer scalar specifying the number of top HVGs to report. |
prop |
Numeric scalar specifying the proportion of genes to report as HVGs. |
var.threshold |
Numeric scalar specifying the minimum threshold on the metric of variation. |
fdr.field |
String specifying the column of |
fdr.threshold |
Numeric scalar specifying the FDR threshold. |
row.names |
Logical scalar indicating whether row names should be reported. |
This function will identify all genes where the relevant metric of variation is greater than var.threshold
.
By default, this means that we retain all genes with positive values in the var.field
column of stats
.
If var.threshold=NULL
, the minimum threshold on the value of the metric is not applied.
If fdr.threshold
is specified, we further subset to genes that have FDR less than or equal to fdr.threshold
.
By default, FDR thresholding is turned off as modelGeneVar
and related functions
determine significance of large variances relative to other genes.
This can be overly conservative if many genes are highly variable.
If n=NULL
and prop=NULL
, the resulting subset of genes is directly returned.
Otherwise, the top set of genes with the largest values of the variance metric are returned,
where the size of the set is defined as the larger of n
and prop*nrow(stats)
.
A character vector containing the names of the most variable genes, if row.names=TRUE
.
Otherwise, an integer vector specifying the indices of stats
containing the most variable genes.
Aaron Lun
modelGeneVar
and friends, to generate stats
.
modelGeneCV2
and friends, to also generate stats
.
library(scuttle)
sce <- mockSCE()
sce <- logNormCounts(sce)
stats <- modelGeneVar(sce)
str(getTopHVGs(stats))
str(getTopHVGs(stats, fdr.threshold=0.05)) # more stringent
# Or directly pass in the SingleCellExperiment:
str(getTopHVGs(sce))
# Alternatively, use with the coefficient of variation:
stats2 <- modelGeneCV2(sce)
str(getTopHVGs(stats2, var.field="ratio"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.