cms | R Documentation |
Calculates cell-specific mixing scores based on euclidean distances within a subspace of integrated data.
cms(
sce,
k,
group,
dim_red = "PCA",
assay_name = "logcounts",
res_name = NULL,
k_min = NA,
smooth = TRUE,
n_dim = 20,
cell_min = 10,
batch_min = NULL,
unbalanced = FALSE,
BPPARAM = SerialParam()
)
sce |
A |
k |
Numeric. Number of k-nearest neighbours (knn) to use. |
group |
Character. Name of group/batch variable.
Needs to be one of |
dim_red |
Character. Name of embeddings to use as subspace for distance distributions. Default is "PCA". |
assay_name |
Character. Name of the assay to use for PCA.
Only relevant if no existing 'dim_red' is provided.
Must be one of |
res_name |
Character. Appendix of the result score's name (e.g. method used to combine batches). |
k_min |
Numeric. Minimum number of knn to include. Default is NA (see Details). |
smooth |
Logical. Indicating if cms results should be smoothened within each neighbourhood using the weigthed mean. |
n_dim |
Numeric. Number of dimensions to include to define the subspace. |
cell_min |
Numeric. Minimum number of cells from each group to be included into the AD test. |
batch_min |
Numeric. Minimum number of cells per batch to include in to the AD test. If set neighbours will be included until batch_min cells from each batch are present. |
unbalanced |
Boolean. If True neighbourhoods with only one batch present will be set to NA. This way they are not included into any summaries or smoothening. |
BPPARAM |
A BiocParallelParam object specifying whether cms scores shall be calculated in parallel. |
The cms function tests the hypothesis, that group-specific distance
distributions of knn cells have the same underlying unspecified distribution.
It performs Anderson-Darling tests as implemented in the
kSamples package
.
In default the function uses all distances and group label defined in knn.
Alternative a density based neighbourhood can be defined by specifying
k_min
. In this case the first local minimum of the overall distance
distribution with at least k_min cells is used. This can be used to adapt to
the local structure of the datatset e.g. prevent cells from a
different cluster to be included. Third the neighbourhood can be defined by
batch occurences. batch_min
specifies the minimal number of cells from
each batch that should be included to define the neighbourhood.
If 'dim_red' is not defined or default cms will calculate a PCA using
runPCA
. Results will be appended to colData(sce)
.
Names can be specified using res_name
.
If multiple cores are available cms scores can be calculated in parallel
(does not work on Windows). Parallelization can be specified using BPPARAM.
A SingleCellExperiment
with cms (and cms_smooth) within
colData.
Scholz, F. W. and Stephens, M. A. (1987). K-Sample Anderson-Darling Tests. J. Am. Stat. Assoc.
.cmsCell
, .smoothCms
.
library(SingleCellExperiment)
sim_list <- readRDS(system.file("extdata/sim50.rds", package = "CellMixS"))
sce <- sim_list[[1]][, c(1:50)]
sce_cms <- cms(sce, k = 20, group = "batch", n_dim = 2)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.