cms: cms

Description Usage Arguments Details Value References See Also Examples

View source: R/cms.R

Description

Calculates cell-specific mixing scores based on euclidean distances within a subspace of integrated data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
cms(
  sce,
  k,
  group,
  dim_red = "PCA",
  assay_name = "logcounts",
  res_name = NULL,
  k_min = NA,
  smooth = TRUE,
  n_dim = 20,
  cell_min = 10,
  batch_min = NULL,
  unbalanced = FALSE,
  BPPARAM = SerialParam()
)

Arguments

sce

A SingleCellExperiment object with the combined data.

k

Numeric. Number of k-nearest neighbours (knn) to use.

group

Character. Name of group/batch variable. Needs to be one of names(colData(sce))

dim_red

Character. Name of embeddings to use as subspace for distance distributions. Default is "PCA".

assay_name

Character. Name of the assay to use for PCA. Only relevant if no existing 'dim_red' is provided. Must be one of names(assays(sce)). Default is "logcounts".

res_name

Character. Appendix of the result score's name (e.g. method used to combine batches).

k_min

Numeric. Minimum number of knn to include. Default is NA (see Details).

smooth

Logical. Indicating if cms results should be smoothened within each neighbourhood using the weigthed mean.

n_dim

Numeric. Number of dimensions to include to define the subspace.

cell_min

Numeric. Minimum number of cells from each group to be included into the AD test.

batch_min

Numeric. Minimum number of cells per batch to include in to the AD test. If set neighbours will be included until batch_min cells from each batch are present.

unbalanced

Boolean. If True neighbourhoods with only one batch present will be set to NA. This way they are not included into any summaries or smoothening.

BPPARAM

A BiocParallelParam object specifying whether cms scores shall be calculated in parallel.

Details

The cms function tests the hypothesis, that group-specific distance distributions of knn cells have the same underlying unspecified distribution. It performs Anderson-Darling tests as implemented in the kSamples package. In default the function uses all distances and group label defined in knn. Alternative a density based neighbourhood can be defined by specifying k_min. In this case the first local minimum of the overall distance distribution with at least k_min cells is used. This can be used to adapt to the local structure of the datatset e.g. prevent cells from a different cluster to be included. Third the neighbourhood can be defined by batch occurences. batch_min specifies the minimal number of cells from each batch that should be included to define the neighbourhood. If 'dim_red' is not defined or default cms will calculate a PCA using runPCA. Results will be appended to colData(sce). Names can be specified using res_name. If multiple cores are available cms scores can be calculated in parallel (does not work on Windows). Parallelization can be specified using BPPARAM.

Value

A SingleCellExperiment with cms (and cms_smooth) within colData.

References

Scholz, F. W. and Stephens, M. A. (1987). K-Sample Anderson-Darling Tests. J. Am. Stat. Assoc.

See Also

.cmsCell, .smoothCms.

Examples

1
2
3
4
5
library(SingleCellExperiment)
sim_list <- readRDS(system.file("extdata/sim50.rds", package = "CellMixS"))
sce <- sim_list[[1]][, c(1:50)]

sce_cms <- cms(sce, k = 20, group = "batch", n_dim = 2)

CellMixS documentation built on Dec. 19, 2020, 2 a.m.