AUCell_exploreThresholds: AUCell_exploreThresholds

View source: R/03_exploreThresholds.R

AUCell_exploreThresholdsR Documentation

AUCell_exploreThresholds

Description

Plots all the AUC histograms (per gene-set) and calculates several likely thresholds for each gene-set

Usage

AUCell_exploreThresholds(
  cellsAUC,
  thrP = 0.01,
  nCores = 1,
  smallestPopPercent = 0.25,
  plotHist = TRUE,
  densAdjust = 2,
  assignCells = FALSE,
  nBreaks = 100,
  verbose = TRUE
)

getThresholdSelected(aucellThresholds)

getAssignments(aucellThresholds)

Arguments

cellsAUC

AUC object returned by AUCell_calcAUC.

thrP

Probability to determine outliers in some of the distributions (see 'details' section).

By default it is set to 1% (thrP): if there are 3000 cells in the dataset, it is expected that approximately 30 cells are over this threshold if the AUC is normally distributed.

nCores

Number of cores to use for computation.

smallestPopPercent

Size (percentage) of the smallest population of cells expected. Used to calculate some of the thresholds.

plotHist

Whether to plot the AUC histograms. (TRUE / FALSE)

densAdjust

Parameter for the density curve. (See density for details).

assignCells

Return the list of cells that pass the automatically selected threshold? (TRUE/FALSE)

nBreaks

Number of bars to plot in the histograms.

verbose

Should the function show progress messages? (TRUE / FALSE)

aucellThresholds

For aux functions: Output from AUCell_exploreThresholds

Details

To ease the selection of an assignment theshold, this function adjusts the AUCs of each gene-set to several distributions and calculates possible thresholds:

  • minimumDens (plot in Blue): Inflection point of the density curve. This is usually a good option for the ideal situation with bimodal distributions.

    To avoid false positives, by default this threshold will not be chosen if the second distribution is higher (i.e. the majority of cells have the gene-set "active").

  • L_k2 (plot in Red): Left distribution, after adjusting the AUC to a mixture of two distributions. The threshold is set to the right (prob: 1-(thrP/nCells)). Only available if 'mixtools' package is installed.

  • R_k3 (plot in Pink): Right distribution, after adjusting the AUC to a mixture of three distributions. The threshold is set to the left (prob: thrP). Only available if 'mixtools' package is installed.

  • Global_k1 (plot in Grey): "global" distribution (i.e. mean and standard deviations of all cells). The threshold is set to the right (prob: 1-(thrP/nCells)).

    The threshold based on the global distribution is ignored from the automatic selection unless the mixed models are overlapping.

Note: If assignCells=TRUE, the highest threshold is used to select cells. However, keep in mind that this function is only meant to ease the selection of the threshold, and we highly recommend to look at the AUC histograms and adjust the threshold manually if needed. We recommend to be specially aware on gene-sets with few genes (10-15) and thresholds that are set extremely low.

Value

List with the following elements for each gene-set:

  • 'aucThr' Thresholds calculated with each method (see 'details' section), and the number of cells that would be assigned using that threshold.

    If assignCells=TRUE, the threshold selected automatically is the highest value (in most cases, excluding the global distribution).

  • 'assignment' List of cells that pass the selected AUC threshold (if assignCells=TRUE)

If plotHist=TRUE the AUC histogram is also plot, including the distributions calculated and the corresponding thresholds in the same color (dashed vertical lines). The threshold that is automatically selected is shown as a thicker non-dashed vertical line.

See Also

Previous step in the workflow: AUCell_calcAUC.

See the package vignette for examples and more details: vignette("AUCell")

Examples

# This example is run using a fake expression matrix.
# Therefore, the output will be meaningless.

############# Fake expression matrix #############
set.seed(123)
exprMatrix <- matrix(data=sample(c(rep(0, 5000), sample(1:3, 5000, replace=TRUE))),
                     nrow=20, 
                     dimnames=list(paste("Gene", 1:20, sep=""), 
                                   paste("Cell", 1:500, sep="")))
dim(exprMatrix)
##################################################

######### Previous steps in the workflow #########
# Step 1.
cells_rankings <- AUCell_buildRankings(exprMatrix, plotStats=FALSE)

# Step 2.
# (Gene sets: random genes)
geneSets <- list(geneSet1=sample(rownames(exprMatrix), 10),
                 geneSet2=sample(rownames(exprMatrix), 5))
cells_AUC <- AUCell_calcAUC(geneSets, cells_rankings, aucMaxRank=5)
##################################################

############## Step 3: Assign cells ##############

# 1. Plot histograms and obtain some pre-computed thresholds
# (this example is only meant to show the interface/arguments of the function,
# see the vignette for meaningful examples)
set.seed(123)
par(mfrow=c(1,2)) # Plot is divided into one row and two columns
thresholds <- AUCell_exploreThresholds(cells_AUC, plotHist=TRUE)
thresholds$geneSet1$aucThr

# 2. Obtain cells over a given threshold:
names(which(getAUC(cells_AUC)["geneSet1",] > 0.19))

# Alternative 1: assign cells according to the 'automatic' threshold
cells_assignment <- AUCell_exploreThresholds(cells_AUC,
                                             plotHist=FALSE, assignCells=TRUE)
# Cells assigned:
getAssignments(cells_assignment)
# Threshold applied:
getThresholdSelected(cells_assignment)

# Alternative 2: choose a threshold manually and assign cells
newThresholds = getThresholdSelected(cells_assignment)
newThresholds['geneSet1'] = 0.8
newAssignments = AUCell_assignCells(cells_AUC, newThresholds)
getAssignments(newAssignments)




aertslab/AUCell documentation built on March 12, 2024, 11:40 p.m.