Description Usage Arguments Details Value Author(s) See Also Examples
Recover intrasample doublets that are neighbors to known intersample doublets in a multiplexed experiment.
This function is now deprecated, use recoverDoublets
from scDblFinder instead.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19  doubletRecovery(x, ...)
## S4 method for signature 'ANY'
doubletRecovery(
x,
doublets,
samples,
k = 50,
transposed = FALSE,
subset.row = NULL,
BNPARAM = KmknnParam(),
BPPARAM = SerialParam()
)
## S4 method for signature 'SummarizedExperiment'
doubletRecovery(x, ..., assay.type = "logcounts")
## S4 method for signature 'SingleCellExperiment'
doubletRecovery(x, ..., use.dimred = NULL)

x 
A logexpression matrix for all cells (including doublets) in columns and genes in rows. Alternatively, a SummarizedExperiment or SingleCellExperiment containing such a matrix. If 
... 
For the generic, additional arguments to pass to specific methods. For the SummarizedExperiment method, additional arguments to pass to the ANY method. For the SingleCellExperiment method, additional arguments to pass to the SummarizedExperiment method. 
doublets 
A logical, integer or character vector specifying which cells in 
samples 
A numeric vector containing the relative proportions of cells from each sample, used to determine how many cells are to be considered as intrasample doublets. 
k 
Integer scalar specifying the number of nearest neighbors to use for computing the local doublet proportions. 
transposed 
Logical scalar indicating whether 
subset.row 
See 
BNPARAM 
A BiocNeighborParam object specifying the algorithm to use for the nearest neighbor search. 
BPPARAM 
A BiocParallelParam object specifying the parallelization to use for the nearest neighbor search. 
assay.type 
A string specifying which assay values contain the logexpression matrix. 
use.dimred 
A string specifying whether existing values in 
In multiplexed singlecell experiments, we can detect doublets as libraries with labels for multiple samples. However, this approach fails to identify doublets consisting of two cells with the same label. Such cells may be problematic if they are still sufficiently abundant to drive formation of spurious clusters.
This function identifies intrasample doublets based on the similarity in expression profiles to known intersample doublets.
For each cell, we compute the proportion of the k
neighbors that are known doublets.
Of the “unmarked” cells that are not known doublets,
those with top X largest proportions are considered to be intrasample doublets.
To compute X, we assume that the formation of doublets is random with respect to their originating samples.
This allows us to use samples
to estimate the expected percentage of doublets that should occur within samples.
We then convert into an absolute number X based on the number of known doublets in doublets
.
A larger value of k
provides more stable estimates of the doublet proportion in each cell.
However, this comes at the cost of assuming that each cell actually has k
neighboring cells of the same state.
For example, if a doublet cluster has fewer than k
members,
its doublet proportions will be “diluted” by inclusion of unmarked cells in the nextclosest cluster.
In principle, it is also possible to identify intersample doublets by applying a hard threshold on the doublet proportion.
This threshold can be set close to the expected percentage from samples
(i.e., the same one used to derive X).
Unfortunately, in practice, the observed proportions are generally lower than expected,
possibly due to contamination of doublet subpopulations by unmarked cells in noisy expression data.
This motivates the use of a top X approach instead.
A DataFrame containing one row per cell and the following fields:
proportion
, a numeric field containing the proportion of neighbors that are doublets.
known
, a logical field indicating whether this cell is a known intersample doublet.
predicted
, a logical field indicating whether this cell is a predicted intrasample doublet.
The metadata
contains intra
, a numeric scalar containing the expected number of intrasample doublets.
Aaron Lun
doubletCells
and doubletCluster
,
for alternative methods of doublet detection when no prior doublet information is available.
hashedDrops
from the DropletUtils package,
to identify doublets from cell hashing experiments.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17  # Mocking up an example.
set.seed(100)
ngenes < 1000
mu1 < 2^rnorm(ngenes, sd=2)
mu2 < 2^rnorm(ngenes, sd=2)
counts.1 < matrix(rpois(ngenes*100, mu1), nrow=ngenes) # Pure type 1
counts.2 < matrix(rpois(ngenes*100, mu2), nrow=ngenes) # Pure type 2
counts.m < matrix(rpois(ngenes*20, mu1+mu2), nrow=ngenes) # Doublets (1 & 2)
all.counts < cbind(counts.1, counts.2, counts.m)
lcounts < scuttle::normalizeCounts(all.counts)
# Pretending that half of the doublets are known. Also pretending that
# the experiment involved two samples of equal size.
known < 200 + seq_len(10)
out < doubletRecovery(lcounts, doublets=known, k=10, samples=c(1, 1))
out

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.