downsample: Downsample datasets
In rliger: Linked Inference of Genomic Experimental Relationships

downsample

R Documentation

Downsample datasets

Description

This function mainly aims at downsampling datasets to a size suitable for plotting or expensive in-memmory calculation.

Users can balance the sample size of categories of interests with balance. Multi-variable specification to balance is supported, so that at most maxCells cells will be sampled from each combination of categories from the variables. For example, when two datasets are presented and three clusters labeled across them, there would then be at most 2 \times 3 \times maxCells cells being selected. Note that "dataset" will automatically be added as one variable when balancing the downsampling. However, if users want to balance the downsampling solely basing on dataset origin, users have to explicitly set balance = "dataset".

Usage

downsample(
  object,
  balance = NULL,
  maxCells = 1000,
  useDatasets = NULL,
  seed = 1,
  returnIndex = FALSE,
  ...
)

Arguments

`object`	liger object
`balance`	Character vector of categorical variable names in `cellMeta` slot, to subsample `maxCells` cells from each combination of all specified variables. Default `NULL` samples `maxCells` cells from the whole object.
`maxCells`	Max number of cells to sample from the grouping based on `balance`.
`useDatasets`	Index selection of datasets to include Default `NULL` for using all datasets.
`seed`	Random seed for reproducibility. Default `1`.
`returnIndex`	Logical, whether to only return the numeric index that can subset the original object instead of a subset object. Default `FALSE`.
`...`	Arguments passed to `subsetLiger`, where `cellIdx` is occupied by internal implementation.

Value

By default, a subset of liger object. Alternatively when returnIndex = TRUE, a numeric vector to be used with the original object.

Examples

# Subsetting an object
pbmc <- downsample(pbmc)
# Creating a subsetting index
sampleIdx <- downsample(pbmcPlot, balance = "leiden_cluster",
                        maxCells = 10, returnIndex = TRUE)
plotClusterDimRed(pbmcPlot, cellIdx = sampleIdx)

rliger documentation built on June 8, 2025, 1:56 p.m.