Estimate the contribution of the ambient solution to a particular expression profile, based on the abundance of control features that should not be expressed in the latter.
1 2 3 4 5 6
A numeric count matrix where each row represents a gene and each column represents an expression profile. The profile usually contains aggregated counts for multiple droplets in a sample, e.g., for a cluster of cells. This can also be a vector, in which case it is converted into a one-column matrix.
A numeric vector of length equal to
A logical, integer or character vector specifying the control features in
Alternatively, a list of vectors specifying mutually exclusive sets of features.
String indicating the output to return - the scaling factor, the ambient profile or the proportion of each gene's counts in
Control features should be those that cannot be expressed and thus fully attributable to ambient contamination.
This is most commonly determined a priori from the biological context and experimental system.
For example, if spike-ins were introduced into the solution prior to cell capture,
these would serve as a gold standard for ambient contamination in
For single-nuclei sequencing, mitochondrial transcripts can serve a similar role
under the assumption that all high-quality libraries are stripped nuclei.
features is a list, it is expected to contain multiple sets of mutually exclusive features.
These features need not be controls but each cell should only express features in one set (or no sets).
The expression of multiple sets can thus be attributed to ambient contamination.
For this mode, an archetypal pairing is that of hemoglobins with immunoglobulins (Young and Behjati, 2018),
which should not be co-expressed in any (known) cell type.
a numeric vector is returned quantifying the estimated “contribution” of the ambient solution to each column of
Scaling columns of
ambient by this vector yields the estimated ambient profile for each column of
which can also be obtained by setting
mode="proportion", a numeric matrix is returned containing the estimated proportion of counts in
y that are attributable to ambient contamination.
This is computed by simply dividing the output of
y and capping all values at 1.
Young MD and Behjati S (2018). SoupX removes ambient RNA contamination from droplet based single-cell RNA sequencing data. biorXiv.
maximumAmbience, when control features are not available.
1 2 3 4 5 6 7 8 9 10 11 12
# Making up some data. ambient <- c(runif(900, 0, 0.1), runif(100)) y <- rpois(1000, ambient * 50) y <- y + c(integer(100), rpois(900, 5)) # actual biology, but first 100 genes silent. # Using the first 100 genes as a control: scaling <- controlAmbience(y, ambient, features=1:100) scaling # Estimating the control contribution to 'y' by 'ambient'. contribution <- controlAmbience(y, ambient, features=1:100, mode="profile") DataFrame(ambient=drop(contribution), total=y)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.