Description Usage Arguments Details Value Metrics References Examples
View source: R/evalIntegration.R
Function to evaluate sc data integration providing a framework for different metrics. Metrics to evaluate mixing and preservance of the local/individual structure are provided.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | evalIntegration(
metrics,
sce,
group,
dim_red = "PCA",
assay_name = "logcounts",
n_dim = 10,
res_name = NULL,
k = NULL,
k_min = NA,
smooth = TRUE,
cell_min = 10,
batch_min = NULL,
unbalanced = FALSE,
weight = TRUE,
k_pos = 5,
sce_pre_list = NULL,
dim_combined = dim_red,
assay_pre = "logcounts",
n_combined = 10,
BPPARAM = SerialParam()
)
|
metrics |
Character vector. Name of the metrics to apply. Must be one to all of 'cms', 'ldfDiff', 'isi', 'mixingMetric', 'localStructure', 'entropy'. |
sce |
|
group |
Character. Name of group/batch variable.
Needs to be one of |
dim_red |
Character. Name of embedding to use as subspace for distance distributions. Default is "PCA". |
assay_name |
Character. Name of the assay to use for PCA.
Only relevant if no existing 'dim_red' is provided.
Must be one of |
n_dim |
Numeric. Number of dimensions to include to define the subspace. |
res_name |
Character vector. Appendix of the result score's name (e.g. method used to combine batches). Needs to have the same length as metrics or NULL. |
k |
Numeric. Number of k-nearest neighbours (knn) to use. |
k_min |
Numeric. Minimum number of knn to include
(see |
smooth |
Logical. Indicating if cms results should be smoothened within each neighbourhood using the weigthed mean. Relevant for metric: 'cms'. |
cell_min |
Numeric. Minimum number of cells from each group to be included into the AD test. Should be > 4. Relevant for metric: 'cms'. |
batch_min |
Numeric. Minimum number of cells per batch to include in to the AD test. If set, neighbours will be included until batch_min cells from each batch are present. Relevant for metrics: 'cms'. |
unbalanced |
Boolean. If TRUE, neighbourhoods with only one batch present will be set to NA. This way they are not included into any summaries or smoothening. Relevant for metrics: 'cms'. |
weight |
Boolean. If TRUE, batch probabilities to calculate the isi score are weighted by the mean distance of their cells towards the cell of interest. Relevant for metrics: 'isi'. |
k_pos |
Numeric. Position of cell to be used as reference within mixing
metric. See |
sce_pre_list |
A list of |
dim_combined |
Character. Name of embeddings to use as subspace to
calculate LDF after integration. Default is |
assay_pre |
Character. Name of the assay to use for PCA.
Only relevant if no existing 'dim_red' is provided.
Must be one of |
n_combined |
Number of PCs to use in original space.
See |
BPPARAM |
A BiocParallelParam object specifying whether cms scores shall be calculated in parallel. Relevant for metric: 'cms'. |
evalIntegration is a wrapper function for different metrics to understand results of integrated single cell data sets. In general there are metrics evaluationg the *mixing* of datasets, that is, metrics that show whether there still is a bias for different datasets after integration. Furthermore there are metrics to evaluate how well the dataset internal structure has been retained, that is, metrics that show whether there has been (potentially biological) signal removed or noise added by integration.
A SingleCellExperiment
with the chosen metric's score within
colData.
Here we provide the following metrics:
Cellspecific Mixing Score. Metric that tests the hypothesis
that group-specific distance distributions of knn cells have the same
underlying unspecified distribution. The score can be interpreted as the
data's probability within an equally mixed neighbourhood according to the
batch variable (see cms
).
Inverse Simpson Index. Metric that uses the Inverse Simpson’s Index to calculate the diversification within a specified neighbourhood. The Simpson index describes the probability that two entities are taken at random from the dataset and its inverse represent the effective number of batches in a neighbourhood. The inverse Simpson index has been proposed as a diversity score for batch mixing in single cell RNAseq by Korunsky et al. They provide a distance-based neighbourhood weightening in their Lisi package.
Mixing Metric. Metric using the median position of the
kth cell from each batch within its knn as a score. The lower the better
mixed is the neighbourhood. We implemented an equivalent version to the
one in the Seurat package (See MixingMetric
and
mixMetric
.)
Shannon entropy. Metric calculating the Shannon entropy of
the batch/group variable within each cell's k-nearest neigbours.
For balanced batches the entropy is closer to 1 the higher the variables
randomness. For unbalanced batches entropy should only be used as a
relative metric in a comparative setting (See entropy
.)
Local density factor differences. Metric that determines
cell-specific changes in the Local Density Factor before and after data
integration. A metric/difference close to 0 indicates no distortion of
the previous structure (see ldfDiff
).
Local structure. Metric that compares the
intersection of knn from the same batch before and after integration
returning the average between all groups. The higher the more neighbours
were reproduced after integration. Here we implemented an equivalent
version to the one in the Seurat package
(See LocalStruct
and locStructure
).
Korsunsky I Fan J Slowikowski K Zhang F Wei K et. al. (2018). Fast, sensitive, and accurate integration of single cell data with Harmony. bioRxiv (preprint).
Stuart T Butler A Hoffman P Hafemeister C Papalexi E et. al. (2019) Comprehensive Integration of Single-Cell Data. Cell.
1 2 3 4 5 6 7 8 9 | library(SingleCellExperiment)
sim_list <- readRDS(system.file("extdata/sim50.rds", package = "CellMixS"))
sce <- sim_list[[1]][, c(1:15, 300:320, 16:30)]
sce_batch1 <- sce[,colData(sce)$batch == "1"]
sce_batch2 <- sce[,colData(sce)$batch == "2"]
pre <- list("1" = sce_batch1, "2" = sce_batch2)
sce <- evalIntegration(metrics = c("cms", "mixingMetric", "isi", "entropy"), sce, "batch", k = 20)
sce <- evalIntegration("ldfDiff", sce, "batch", k = 20, sce_pre_list = pre)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.