ambientContribSparse: Ambient contribution by assuming sparsity

ambientContribSparseR Documentation

Ambient contribution by assuming sparsity

Description

Estimate the contribution of the ambient solution to each droplet by assuming that no more than a certain percentage of features are actually present/expressed in the cell.

Usage

ambientContribSparse(y, ...)

## S4 method for signature 'ANY'
ambientContribSparse(
  y,
  ambient,
  prop = 0.5,
  mode = c("scale", "profile", "proportion"),
  BPPARAM = SerialParam()
)

## S4 method for signature 'SummarizedExperiment'
ambientContribSparse(y, ..., assay.type = "counts")

Arguments

y

A numeric matrix-like object containing counts, where each row represents a feature (usually a conjugated tag) and each column represents a cell or group of cells.

Alternatively, a SummarizedExperiment object containing such a matrix.

y can also be a numeric vector of counts; this is coerced into a one-column matrix.

...

For the generic, further arguments to pass to individual methods.

For the SummarizedExperiment method, further arguments to pass to the ANY method.

ambient

A numeric vector of length equal to nrow(y), containing the proportions of transcripts for each feature in the ambient solution.

prop

Numeric scalar specifying the maximum proportion of features that are expected to be present for any cell.

mode

String indicating the output to return, see Value.

BPPARAM

A BiocParallelParam object specifying how parallelization should be performed.

assay.type

Integer or string specifying the assay containing the count matrix.

Details

The assumption here is that of sparsity, i.e., no more than prop * nrow(y) features should be actually present in each cell with a non-zero number of molecules. This is reasonable for most tag-based applications where we would expect only 1-2 tags (for cell hashing) or a minority of tags (for general CITE-seq) to be present per cell. Thus, counts for all other features must be driven by ambient contamination, allowing us to estimate a scaling factor for each cell based on the ratio to the ambient profile.

For gene expression, the sparsity assumption is less justifiable. Each cell could feasibly express a majority of the transcriptome (once we ignore constitutively silent features in the annotation, like pseudogenes). The sparsity of gene expression data also yields less precise scale estimates, reducing their utility in downstream applications. See ambientContribNegative or ambientContribMaximum instead, which operate from different assumptions.

Value

If mode="scale", a numeric vector is returned quantifying the estimated “contribution” of the ambient solution to each column of y. Scaling ambient by each entry yields the estimated ambient profile for the corresponding column of y.

If mode="profile", a numeric matrix is returned containing the estimated ambient profile for each column of y. This is computed by scaling as described above; if ambient is a matrix, each column is scaled by the corresponding entry of the scaling vector.

If mode="proportion", a numeric matrix is returned containing the proportion of counts in y that are attributable to ambient contamination. This is computed by simply dividing the output of mode="profile" by y and capping all values at 1.

Author(s)

Aaron Lun

See Also

ambientProfileBimodal, to estimate the ambient profile for use in ambient.

cleanTagCounts, where this function is used to estimate ambient scaling factors.

Examples

amb <- 1:10
y <- matrix(rpois(10000, lambda=amb), nrow=10)
y[sample(length(y), 1000, replace=TRUE)] <- 1000

scaling <- ambientContribSparse(y, ambient=amb)
hist(scaling)


MarioniLab/DropletUtils documentation built on Oct. 12, 2024, 5:40 p.m.