ambientContribMaximum  R Documentation 
Compute the maximum contribution of the ambient solution to an expression profile for a group of droplets, by scaling the ambient profile and testing for significant deviations in the count profile.
maximumAmbience(...)
ambientContribMaximum(y, ...)
## S4 method for signature 'ANY'
ambientContribMaximum(
y,
ambient,
threshold = 0.1,
dispersion = 0,
num.points = 100,
num.iter = 5,
mode = c("scale", "profile", "proportion"),
BPPARAM = SerialParam()
)
## S4 method for signature 'SummarizedExperiment'
ambientContribMaximum(y, ..., assay.type = "counts")
... 
For the generic, further arguments to pass to individual methods. For the SummarizedExperiment method, further arguments to pass to the ANY method. For 
y 
A numeric matrixlike object containing counts, where each row represents a gene and each column represents a cluster of cells (see Caveats). Alternatively, a SummarizedExperiment object containing such a matrix.

ambient 
A numeric vector of length equal to 
threshold 
Numeric scalar specifying the pvalue threshold to use, see Details. 
dispersion 
Numeric scalar specifying the dispersion to use in the negative binomial model. Defaults to zero, i.e., a Poisson model. 
num.points 
Integer scalar specifying the number of points to use for the grid search. 
num.iter 
Integer scalar specifying the number of iterations to use for the grid search. 
mode 
String indicating the output to return, see Value. 
BPPARAM 
A BiocParallelParam object specifying how parallelization should be performed. 
assay.type 
Integer or string specifying the assay containing the count matrix. 
On occasion, it is useful to estimate the maximum possible contribution of the ambient solution to a count profile. This represents the most pessimistic explanation of a particular expression pattern and can be used to identify and discard suspect genes or clusters prior to downstream analyses.
This function implements the following algorithm:
We compute the mean ambient contribution for each gene by scaling ambient
by some factor.
ambient
itself is usually derived by summing counts across barcodes with low total counts,
see the output of emptyDrops
for an example.
We compute a pvalue for each gene based on the probability of observing a count equal to or below that in y
, using the lower tail of a negative binomial (or Poisson) distribution with mean set to the ambient contribution.
The pergene null hypothesis is that the expected count in y
is equal to the sum of the scaled ambient proportion and some (nonnegative) contribution from actual intracellular transcripts.
We combine pvalues across all genes using Simes' method. This represents the evidence against the joint null hypothesis (that all of the pergene nulls are true).
We find the largest scaling factor that fails to reject this joint null at the specified threshold
.
If sum(ambient)
is equal to unity, this scaling factor can be interpreted as the maximum number of transcript molecules contributed to y
by the ambient solution.
The process of going from a scaling factor to a combined pvalue has no clean analytical solution,
so we use an iterative grid search to identify to largest possible scaling factor at a decent resolution.
num.points
and num.iter
control the resolution of the grid search,
and generally do not need to be changed.
maximumAmbience
is softdeprecated; use ambientContribMaximum
instead.
If mode="scale"
,
a numeric vector is returned quantifying the maximum “contribution” of the ambient solution to each column of y
.
Scaling ambient
by each entry yields the maximum ambient profile for the corresponding column of y
.
If mode="profile"
, a numeric matrix is returned containing the maximum ambient profile for each column of y
.
This is computed by scaling as described above; if ambient
is a matrix, each column is scaled by the corresponding entry of the scaling vector.
If mode="proportion"
, a numeric matrix is returned containing the maximum proportion of counts in y
that are attributable to ambient contamination.
This is computed by simply dividing the output of mode="profile"
by y
and capping all values at 1.
The above algorithm is rather ad hoc and offers little in the way of theoretical guarantees.
The pvalue is used as a score rather than providing any meaningful error control.
Empirically, increasing threshold
will return a higher scaling factor by making the estimation more robust to dropouts in y
, at the cost of increasing the risk of overestimation of the ambient contribution.
Our abuse of the pvalue machinery means that the reported scaling often exceeds the actual contribution, especially at low counts where the reduced power fails to penalize overly large scaling factors.
Hence, the function works best when y
contains aggregated counts for one or more groups of droplets with the same expected expression profile, e.g., clusters of related cells.
Higher counts provide more power to detect deviations, hopefully leading to a more accurate estimate of the scaling factor.
(On a practical note, this function is rather slow so it is more feasible to calculate on clusterlevel profiles rather than per cell.)
Note that this function returns the maximum possible contribution of the ambient solution to y
, not the actual contribution.
In the most extreme case, if the ambient profile is similar to the expectation of y
(e.g., due to sequencing a relatively homogeneous cell population), the maximum possible contribution of the ambient solution would be 100% of y
, and subtraction would yield an empty count vector!
Aaron Lun
ambientProfileEmpty
and ambientProfileBimodal
, to estimate the ambient profile.
ambientContribSparse
and ambientContribNegative
, for other methods to estimate the ambient contribution.
emptyDrops
, which uses the ambient profile to call cells.
ambientProfileEmpty
or ambientProfileBimodal
, to obtain an estimate to use in ambient
.
ambientContribNegative
or ambientContribSparse
, for other methods of estimating the contribution.
# Making up some data for, e.g., a single cluster.
ambient < c(runif(900, 0, 0.1), runif(100))
y < rpois(1000, ambient * 100)
y[1:100] < y[1:100] + rpois(100, 20) # actual biology.
# Estimating the maximum possible scaling factor:
scaling < ambientContribMaximum(y, ambient)
scaling
# Estimating the maximum contribution to 'y' by 'ambient'.
contribution < ambientContribMaximum(y, ambient, mode="profile")
DataFrame(ambient=drop(contribution), total=y)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.