ambientContribNegative | R Documentation |
Estimate the contribution of the ambient solution to a particular expression profile, based on the abundance of negative control features that should not be expressed in the latter.
controlAmbience(...)
ambientContribNegative(y, ...)
## S4 method for signature 'ANY'
ambientContribNegative(
y,
ambient,
features,
mode = c("scale", "profile", "proportion")
)
## S4 method for signature 'SummarizedExperiment'
ambientContribNegative(y, ..., assay.type = "counts")
... |
For the generic, further arguments to pass to individual methods. For the SummarizedExperiment method, further arguments to pass to the ANY method. For |
y |
A numeric matrix-like object containing counts, where each row represents a feature (e.g., a gene or a conjugated tag) and each column represents either a cell or group of cells. Alternatively, a SummarizedExperiment object containing such a matrix.
|
ambient |
A numeric vector of length equal to |
features |
A logical, integer or character vector specifying the negative control features in Alternatively, a list of vectors specifying mutually exclusive sets of features. |
mode |
String indicating the output to return, see Value. |
assay.type |
Integer or string specifying the assay containing the count matrix. |
Negative control features should be those that cannot be expressed and thus fully attributable to ambient contamination.
This is most commonly determined a priori from the biological context and experimental system.
For example, if spike-ins were introduced into the solution prior to cell capture,
these would serve as a gold standard for ambient contamination in y
.
For single-nuclei sequencing, mitochondrial transcripts can serve a similar role
under the assumption that all high-quality libraries are stripped nuclei.
If features
is a list, it is expected to contain multiple sets of mutually exclusive features.
Each cell should only express features in at most one set; no cell should express features in different sets.
The expression of multiple sets can thus be attributed to ambient contamination.
For this mode, an archetypal pairing is that of hemoglobins with immunoglobulins (Young and Behjati, 2018),
which should not be co-expressed in any (known) cell type.
controlAmbience
is soft-deprecated; use ambientContribNegative
instead.
If mode="scale"
,
a numeric vector is returned quantifying the estimated “contribution” of the ambient solution to each column of y
.
Scaling ambient
by each entry yields the maximum ambient profile for the corresponding column of y
.
If mode="profile"
, a numeric matrix is returned containing the estimated ambient profile for each column of y
.
This is computed by scaling as described above; if ambient
is a matrix, each column is scaled by the corresponding entry of the scaling vector.
If mode="proportion"
, a numeric matrix is returned containing the estimated proportion of counts in y
that are attributable to ambient contamination.
This is computed by simply dividing the output of mode="profile"
by y
and capping all values at 1.
Aaron Lun
Young MD and Behjati S (2018). SoupX removes ambient RNA contamination from droplet based single-cell RNA sequencing data. biorXiv.
ambientProfileEmpty
or ambientProfileBimodal
, to obtain a profile estimate to use in ambient
.
ambientContribMaximum
or ambientContribSparse
,
for other methods of estimating contribution when negative control features are not available.
# Making up some data.
ambient <- c(runif(900, 0, 0.1), runif(100))
y <- rpois(1000, ambient * 50)
y <- y + c(integer(100), rpois(900, 5)) # actual biology, but first 100 genes silent.
# Using the first 100 genes as negative controls:
scaling <- ambientContribNegative(y, ambient, features=1:100)
scaling
# Estimating the negative control contribution to 'y' by 'ambient'.
contribution <- ambientContribNegative(y, ambient, features=1:100, mode="profile")
DataFrame(ambient=drop(contribution), total=y)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.