cms: cms
In almutlue/CellMixS: Evaluate Cellspecific Mixing

View source: R/cms.R

cms	R Documentation

cms

Description

Calculates cell-specific mixing scores based on euclidean distances within a subspace of integrated data.

Usage

cms(
  sce,
  k,
  group,
  dim_red = "PCA",
  assay_name = "logcounts",
  res_name = NULL,
  k_min = NA,
  smooth = TRUE,
  n_dim = 20,
  cell_min = 10,
  batch_min = NULL,
  unbalanced = FALSE,
  BPPARAM = SerialParam()
)

Arguments

`sce`	A `SingleCellExperiment` object with the combined data.
`k`	Numeric. Number of k-nearest neighbours (knn) to use.
`group`	Character. Name of group/batch variable. Needs to be one of `names(colData(sce))`
`dim_red`	Character. Name of embeddings to use as subspace for distance distributions. Default is "PCA".
`assay_name`	Character. Name of the assay to use for PCA. Only relevant if no existing 'dim_red' is provided. Must be one of `names(assays(sce))`. Default is "logcounts".
`res_name`	Character. Appendix of the result score's name (e.g. method used to combine batches).
`k_min`	Numeric. Minimum number of knn to include. Default is NA (see Details).
`smooth`	Logical. Indicating if cms results should be smoothened within each neighbourhood using the weigthed mean.
`n_dim`	Numeric. Number of dimensions to include to define the subspace.
`cell_min`	Numeric. Minimum number of cells from each group to be included into the AD test.
`batch_min`	Numeric. Minimum number of cells per batch to include in to the AD test. If set neighbours will be included until batch_min cells from each batch are present.
`unbalanced`	Boolean. If True neighbourhoods with only one batch present will be set to NA. This way they are not included into any summaries or smoothening.
`BPPARAM`	A BiocParallelParam object specifying whether cms scores shall be calculated in parallel.

Details

The cms function tests the hypothesis, that group-specific distance distributions of knn cells have the same underlying unspecified distribution. It performs Anderson-Darling tests as implemented in the kSamples package. In default the function uses all distances and group label defined in knn. Alternative a density based neighbourhood can be defined by specifying k_min. In this case the first local minimum of the overall distance distribution with at least k_min cells is used. This can be used to adapt to the local structure of the datatset e.g. prevent cells from a different cluster to be included. Third the neighbourhood can be defined by batch occurences. batch_min specifies the minimal number of cells from each batch that should be included to define the neighbourhood. If 'dim_red' is not defined or default cms will calculate a PCA using runPCA. Results will be appended to colData(sce). Names can be specified using res_name. If multiple cores are available cms scores can be calculated in parallel (does not work on Windows). Parallelization can be specified using BPPARAM.

Value

A SingleCellExperiment with cms (and cms_smooth) within colData.

References

Scholz, F. W. and Stephens, M. A. (1987). K-Sample Anderson-Darling Tests. J. Am. Stat. Assoc.

Examples

library(SingleCellExperiment)
sim_list <- readRDS(system.file("extdata/sim50.rds", package = "CellMixS"))
sce <- sim_list[[1]][, c(1:50)]

sce_cms <- cms(sce, k = 20, group = "batch", n_dim = 2)

almutlue/CellMixS documentation built on Sept. 8, 2024, 12:45 p.m.