ambientProfileBimodal: Ambient profile from bimodality

ambientProfileBimodalR Documentation

Ambient profile from bimodality

Description

Estimate the concentration of each feature in the ambient solution from a filtered count matrix containing only counts for cells, by assuming that each feature has a bimodal abundance distribution with ambient and high-expressing components.

Usage

inferAmbience(...)

ambientProfileBimodal(x, ...)

## S4 method for signature 'ANY'
ambientProfileBimodal(x, min.prop = 0.05)

## S4 method for signature 'SummarizedExperiment'
ambientProfileBimodal(x, ..., assay.type = "counts")

Arguments

...

For the generic, further arguments to pass to individual methods.

For the SummarizedExperiment method, further arguments to pass to the ANY method.

For inferAmbience, arguments to pass to ambientProfileBimodal.

x

A numeric matrix-like object containing counts for each feature (row) and cell (column). Alternatively, a SummarizedExperiment object containing such a matrix.

min.prop

Numeric scalar in (0, 1) specifying the expected minimum proportion of barcodes contributed by each sample.

assay.type

Integer or scalar specifying the assay containing the count matrix.

Details

In some cases, we want to know the ambient profile but we only have the count matrix for the cell-containing libraries. This can be useful in functions such as hashedDrops or as a reference profile in medianSizeFactors. However, as we only have the cell-containing libraries, we cannot use ambientProfileEmpty.

This function estimates the ambient profile by assuming that each feature only labels a minority of the cells. Under this assumption, each feature's log-count distribution has a bimodal distribution where the lower mode represents ambient contamination. This is generally reasonable for tag-based applications like cell hashing or CITE-seq where data is usually binary, i.e., the marker is either present or not. We fit a two-component mixture model and identify all barcodes assigned to the lower component; and the mean of those counts is used as an estimate of the ambient contribution.

The initialization of the mixture model is controlled by min.prop, which starts the means of the lower and upper components at the min.prop and 1-min.prop quantiles, respectively. This means that each sample is expected to contribute somewhere between [min.prop, 1-min.prop] barcodes. Larger values improve convergence but require stronger assumptions about the relative proportions of multiplexed samples.

inferAmbience is soft-deprecated; use ambientProfileBimodal instead.

Value

A numeric vector of length equal to nrow(x), containing the estimated ambient proportions for each feature.

Author(s)

Aaron Lun

See Also

hashedDrops, where this function is used in the absence of an ambient profile.

ambientProfileEmpty, which should be used when the raw matrix (prior to filtering for cells) is available.

ambientContribSparse and related functions, to estimate the contribution of ambient contamination in each library.

Examples

x <- rbind(
    rpois(1000, rep(c(100, 1), c(100, 900))),
    rpois(1000, rep(c(2, 100, 2), c(100, 100, 800))),
    rpois(1000, rep(c(3, 100, 3), c(200, 700, 100)))
)

# Should be close to 1, 2, 3
ambientProfileBimodal(x)


MarioniLab/DropletUtils documentation built on Oct. 12, 2024, 5:40 p.m.