downsample_txis: blockwise downsampling: try to preserve balanced clusters...

Description Usage Arguments Details Value See Also

View source: R/downsample_txis.R

Description

Originally this was a helper function for label_cells but it is useful in its own right, so it is exposed as an exported function now. Note that maxcells is a MAXIMUM, i.e. if there are 30 cells in a cluster for a sample and maxcells == 50, then (obviously?) only 30 cells will be returned for that particular combination of cluster and sample.

Usage

1
2
3
4
5
6
7
downsample_txis(
  txis,
  maxcells = 20,
  mincells = 10,
  ret = c("sce", "colnames"),
  ...
)

Arguments

txis

SingleCellExperiment where !is.null(colLabels(txis))

maxcells

max cells per cluster per sample (see Details) (20)

mincells

min cells per cluster per sample (see Details) (10)

ret

whether to return colnames ("colnames") or (default) "sce"

...

additional arguments to accomodate bootstrapping (not yet)

Details

Especially when using the default Louvain clustering approach, there will be samples without any cells in a cluster, and vice versa. To avoid having a bunch of artifacts, when sample==TRUE, we fit a mixture model to the number of cells in each cluster, and exclude samples with few or no cells in that cluster from block sampling. Don't use this on SmartSeq-type data.

Note that attr(downsample_txis(txis, ret="colnames"), "scheme") is a list with elements 'mincells', 'maxcells', and 'eligible'. 'mincells' & 'maxcells' are integers, while 'eligible' is an integer matrix with counts of cells post-filtering (i.e., subject to mincells and per-cluster mixture fits).

The mixture fits assume that a two-component mixture model on either log(1+cells) or directly on cell number per cluster will remove "noise" elements. This may be false; the user will have to investigate if so.

Value

1
     colnames(txis) satisfying the sampling scheme (see Details)

See Also

find_eligible_cells

label_cells


trichelab/velocessor documentation built on Jan. 5, 2022, 6:27 p.m.