ambientContribSparse | R Documentation |
Estimate the contribution of the ambient solution to each droplet by assuming that no more than a certain percentage of features are actually present/expressed in the cell.
ambientContribSparse(y, ...)
## S4 method for signature 'ANY'
ambientContribSparse(
y,
ambient,
prop = 0.5,
mode = c("scale", "profile", "proportion"),
BPPARAM = SerialParam()
)
## S4 method for signature 'SummarizedExperiment'
ambientContribSparse(y, ..., assay.type = "counts")
y |
A numeric matrix-like object containing counts, where each row represents a feature (usually a conjugated tag) and each column represents a cell or group of cells. Alternatively, a SummarizedExperiment object containing such a matrix.
|
... |
For the generic, further arguments to pass to individual methods. For the SummarizedExperiment method, further arguments to pass to the ANY method. |
ambient |
A numeric vector of length equal to |
prop |
Numeric scalar specifying the maximum proportion of features that are expected to be present for any cell. |
mode |
String indicating the output to return, see Value. |
BPPARAM |
A BiocParallelParam object specifying how parallelization should be performed. |
assay.type |
Integer or string specifying the assay containing the count matrix. |
The assumption here is that of sparsity, i.e., no more than prop * nrow(y)
features should be actually present in each cell with a non-zero number of molecules.
This is reasonable for most tag-based applications where we would expect only 1-2 tags (for cell hashing) or a minority of tags (for general CITE-seq) to be present per cell.
Thus, counts for all other features must be driven by ambient contamination, allowing us to estimate a scaling factor for each cell based on the ratio to the ambient profile.
For gene expression, the sparsity assumption is less justifiable.
Each cell could feasibly express a majority of the transcriptome (once we ignore constitutively silent features in the annotation, like pseudogenes).
The sparsity of gene expression data also yields less precise scale estimates, reducing their utility in downstream applications.
See ambientContribNegative
or ambientContribMaximum
instead, which operate from different assumptions.
If mode="scale"
, a numeric vector is returned quantifying the estimated “contribution” of the ambient solution to each column of y
.
Scaling ambient
by each entry yields the estimated ambient profile for the corresponding column of y
.
If mode="profile"
, a numeric matrix is returned containing the estimated ambient profile for each column of y
.
This is computed by scaling as described above; if ambient
is a matrix, each column is scaled by the corresponding entry of the scaling vector.
If mode="proportion"
, a numeric matrix is returned containing the proportion of counts in y
that are attributable to ambient contamination.
This is computed by simply dividing the output of mode="profile"
by y
and capping all values at 1.
Aaron Lun
ambientProfileBimodal
, to estimate the ambient profile for use in ambient
.
cleanTagCounts
, where this function is used to estimate ambient scaling factors.
amb <- 1:10
y <- matrix(rpois(10000, lambda=amb), nrow=10)
y[sample(length(y), 1000, replace=TRUE)] <- 1000
scaling <- ambientContribSparse(y, ambient=amb)
hist(scaling)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.