getClassAUC: getClassAUC

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/getClassAUC.R

Description

getClassAUC implements one way to investigate clustering quality. It processes the output of sortGenes to obtain a curve for each cell cluster for all gene specificity scores against their ranking in the cluster. The Area Under the Curve (AUC) can be used as a measure of clustering quality in terms of the possibility to identify cell clusters using a few marker genes. See Details.

Usage

1
getClassAUC(gs, markers = NULL, plotCurves = TRUE, colors = NULL)

Arguments

gs

A list containing $specScore sparse matrix. Typically the output of sortGenes().

markers

A character vector of gene names to restrict this analysis to. See Details.

plotCurves

Should a plot be drawn? default value is TRUE.

colors

Color palette for the plot.

Details

Given the specificity score for all genes in a certain cell cluster, we can assume that a well-separated easily-identified cell cluster will have a relatively small number of genes that have a very high specificity score. Top marker genes for a cluster that is poorly separated from other cell clusters will have average or low specificity scores. Sorting the genes for each cell cluster by their specificity scores and plotting the scaled scores in order creates a curve that should be far from the diagonal for well-separated clusters but close to the diagonal for poorly-separated clusters. The AUC of this curve can be used to quantify this intuition and estimate a clustering quality metric.

Value

getClassAUC returns a numeric vector of length ncol($specScore) that contains the AUC for each cell cluster.

Author(s)

Mahmoud M Ibrahim <mmibrahim@pm.me>

See Also

getMarkers returns a cell cluster Shannon index that tends to correlate well with the AUC metric returned by getClassAUC.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
#randomly generated expression matrix and cell clusters
set.seed(1234)
exp = matrix(sample(0:20,1000,replace=TRUE), ncol = 20)
rownames(exp) = sapply(1:50, function(x) paste0("g", x))
cellType = sample(c("cell type 1","cell type 2"),20,replace=TRUE)
sg = sortGenes(exp, cellType)
classAUC = getClassAUC(sg)

#"reasonably" separated clusters
data(sim)
sg = sortGenes(sim$exp, sim$cellType)
classAUC = getClassAUC(sg)

#real data with three well separated clusters
data(kidneyTabulaMuris)
sg = sortGenes(kidneyTabulaMuris$exp, kidneyTabulaMuris$cellType)
classAUC = getClassAUC(sg)

mahmoudibrahim/genesorteR documentation built on April 20, 2021, 4:07 p.m.