adjustCounts: Remove background contamination from count matrix

View source: R/adjustCounts.R

adjustCountsR Documentation

Remove background contamination from count matrix

Description

After the level of background contamination has been estimated or specified for a channel, calculate the resulting corrected count matrix with background contamination removed.

Usage

adjustCounts(
  sc,
  clusters = NULL,
  method = c("subtraction", "soupOnly", "multinomial"),
  roundToInt = FALSE,
  verbose = 1,
  tol = 0.001,
  pCut = 0.01,
  ...
)

Arguments

sc

A SoupChannel object.

clusters

A vector of cluster IDs, named by cellIDs. If NULL clusters auto-loaded from sc. If FALSE, no clusters are used. See details.

method

Method to use for correction. See details. One of 'multinomial', 'soupOnly', or 'subtraction'

roundToInt

Should the resulting matrix be rounded to integers?

verbose

Integer giving level of verbosity. 0 = silence, 1 = Basic information, 2 = Very chatty, 3 = Debug.

tol

Allowed deviation from expected number of soup counts. Don't change this.

pCut

The p-value cut-off used when method='soupOnly'.

...

Passed to expandClusters.

Details

This essentially subtracts off the mean expected background counts for each gene, then redistributes any "unused" counts. A count is unused if its subtraction has no effect. For example, subtracting a count from a gene that has zero counts to begin with.

As expression data is highly sparse at the single cell level, it is highly recommended that clustering information be provided to allow the subtraction method to share information between cells. Without grouping cells into clusters, it is difficult (and usually impossible) to tell the difference between a count of 1 due to background contamination and a count of 1 due to endogenous expression. This ambiguity is removed at the cluster level where counts can be aggregated across cells. This information can then be propagated back to the individual cell level to provide a more accurate removal of contaminating counts.

To provide clustering information, either set clustering on the SoupChannel object with setClusters or explicitly passing the clusters parameter.

If roundToInt=TRUE, this function will round the result to integers. That is, it will take the floor of the connected value and then round back up with probability equal to the fractional part of the number.

The method parameter controls how the removal of counts in performed. This should almost always be left at the default ('subtraction'), which iteratively subtracts counts from all genes as described above. The 'soupOnly' method will use a p-value based estimation procedure to identify those genes that can be confidently identified as having endogenous expression and removes everything else (described in greater detail below). Because this method either removes all or none of the expression for a gene in a cell, the correction procedure is much faster. Finally, the 'multinomial' method explicitly maximises the multinomial likelihood for each cell. This method gives essentially identical results as 'subtraction' and is considerably slower.

In greater detail, the 'soupOnly' method is done by sorting genes within each cell by their p-value under the null of the expected soup fraction using a Poisson model. So that genes that definitely do have a endogenous contribution are at the end of the list with p=0. Those genes for which there is poor evidence of endogenous cell expression are removed, until we have removed approximately nUMIs*rho molecules. The cut-off to prevent removal of genes above nUMIs*rho in each cell is achieved by calculating a separate p-value for the total number of counts removed to exceed nUMIs*rho, again using a Poisson model. The two p-values are combined using Fisher's method and the cut-off is applied to the resulting combined p-value calculated using a chi-squared distribution with 4 degrees of freedom.

Value

A modified version of the table of counts, with background contamination removed.

Examples

out = adjustCounts(scToy)
#Return integer counts only
out = adjustCounts(scToy,roundToInt=TRUE)

SoupX documentation built on Nov. 1, 2022, 5:05 p.m.