emptyDropsCellRanger | R Documentation |
An approximate implementation of the --soloCellFilter EmptyDrops_CR
filtering approach,
which itself was reverse-engineered from the behavior of CellRanger 3.
emptyDropsCellRanger(m, ...)
## S4 method for signature 'ANY'
emptyDropsCellRanger(
m,
n.expected.cells = 3000,
max.percentile = 0.99,
max.min.ratio = 10,
umi.min = 500,
umi.min.frac.median = 0.01,
cand.max.n = 20000,
ind.min = 45000,
ind.max = 90000,
round = TRUE,
niters = 10000,
BPPARAM = SerialParam()
)
## S4 method for signature 'SummarizedExperiment'
emptyDropsCellRanger(m, ..., assay.type = "counts")
m |
A numeric matrix-like object containing counts, where columns represent barcoded droplets and rows represent features. The matrix should only contain barcodes for an individual sample, prior to any filtering for barcodes. Alternatively, a SummarizedExperiment containing such an object. |
... |
Further arguments to pass to individual methods. Specifically, for the SummarizedExperiment method, further arguments to pass to the ANY method. |
n.expected.cells |
An integer scalar specifying the number of expected cells in a sample.
Corresponds to the |
max.percentile |
A numeric scalar between 0 and 1 used to define the maximum UMI count in the simple filtering algorithm.
Corresponds to the |
max.min.ratio |
An integer scalar specifying the ratio of the maximum and minimum UMI count in the simple filtering algorithm.
Corresponds to the |
umi.min |
An integer scalar specifying the minimum UMI count for inclusion of a barcode in the cell candidate pool.
Corresponds to the |
umi.min.frac.median |
A numeric scalar between 0 and 1 used to define the minimum UMI count for inclusion of a barcode in the cell candidate pool.
Specifically, the minimum is defined as |
cand.max.n |
An integer scalar specifying the maximum number of barcodes that can be included in the cell candidate pool.
In effect, this applies a minimum threshold that is defined as the |
ind.min |
An integer scalar specifying the lowest UMI count ranking for inclusion of a barcode in the ambient profile.
Corresponds to the |
ind.max |
An integer scalar specifying the highest UMI count ranking for inclusion of a barcode in the ambient profile.
Corresponds to the |
round |
A logical scalar indicating whether to check for non-integer values in |
niters |
An integer scalar specifying the number of iterations to use for the Monte Carlo p-value calculations. |
BPPARAM |
A BiocParallelParam object indicating whether parallelization should be used. |
assay.type |
String or integer specifying the assay of interest. |
emptyDropsCellRanger
splits each sample's barcodes into three subsets.
The first subset contains barcodes that are selected by the “simple filtering algorithm”, which are regarded as high quality cells without any further filtering.
The minimum threshold T
for this subset is defined by taking the max.percentile
percentile of the top n.expected.cells
barcodes,
and then dividing by the max.min.ratio
to obtain a minimum UMI count.
(This is closely related to the algorithm used by defaultDrops
.)
All barcodes identified in this manner will have an FDR of zero.
The second subset contains the ambient pool and is defined as all barcodes with rankings between ind.min
and ind.max
.
The barcodes that fall in this category will be used to compute the ambient profile.
None of these barcodes are considered to be potential cells.
The third subset contains the pool of barcodes that are potential cells, i.e., cell candidates.
This is defined as all barcodes with total counts below T
and higher than all of the thresholds defined by umi.min
, umi.min.frac.median
and cand.max.n
.
Only the barcodes within this subset will be tested for signficant deviations from the ambient profile, i.e., FDR is not NaN
.
As of time of writing, the arguments in STARsolo have a one-to-one correspondence with the arguments in emptyDropsCellRanger
.
All parameter defaults are set as the same as those used in STARsolo 2.7.9a.
The main differences between emptyDropsCellRanger
and emptyDrops
are:
emptyDropsCellRanger
does not use the knee point to identify “presumed real” cells,
instead relying on a threshold based on the expected number of cells.
emptyDropsCellRanger
takes barcodes whose total count ranks within a certain range - by default, (45,000, 90,000]
- to compute the ambient profile.
In contrast, emptyDrops
only defines the upper bound using lower
or by.rank
.
emptyDropsCellRanger
defines a cell candidate pool according to three parameters, umi.min
, umi.min.frac.median
and cand.max.n
.
In emptyDrops
, this is only defined by lower
.
A DataFrame with the same fields as that returned by emptyDrops
.
Dongze He, Rob Patro
Kaminow B, Yunusov D, Dobin A (2021). STARsolo: accurate, fast and versatile mapping/quantification of single-cell and single-nucleus RNA-seq data. https://www.biorxiv.org/content/10.1101/2021.05.05.442755v1
emptyDrops
, for the original implementation.
# Mocking up some data:
set.seed(0)
my.counts <- DropletUtils:::simCounts(nempty=100000, nlarge=2000, nsmall=1000)
# Identify likely cell-containing droplets.
out <- emptyDropsCellRanger(my.counts)
out
is.cell <- out$FDR <= 0.01
sum(is.cell, na.rm=TRUE)
# Subsetting the matrix to the cell-containing droplets.
# (using 'which()' to handle NAs smoothly).
cell.counts <- my.counts[,which(is.cell),drop=FALSE]
dim(cell.counts)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.