Description Usage Arguments Details Value Metrics References Examples
View source: R/evalIntegration.R
Function to evaluate sc data integration providing a framework for different metrics. Metrics to evaluate mixing and preservance of the local/individual structure are provided.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22  evalIntegration(
metrics,
sce,
group,
dim_red = "PCA",
assay_name = "logcounts",
n_dim = 10,
res_name = NULL,
k = NULL,
k_min = NA,
smooth = TRUE,
cell_min = 10,
batch_min = NULL,
unbalanced = FALSE,
weight = TRUE,
k_pos = 5,
sce_pre_list = NULL,
dim_combined = dim_red,
assay_pre = "logcounts",
n_combined = 10,
BPPARAM = SerialParam()
)

metrics 
Character vector. Name of the metrics to apply. Must be one to all of 'cms', 'ldfDiff', 'isi', 'mixingMetric', 'localStructure', 'entropy'. 
sce 

group 
Character. Name of group/batch variable.
Needs to be one of 
dim_red 
Character. Name of embedding to use as subspace for distance distributions. Default is "PCA". 
assay_name 
Character. Name of the assay to use for PCA.
Only relevant if no existing 'dim_red' is provided.
Must be one of 
n_dim 
Numeric. Number of dimensions to include to define the subspace. 
res_name 
Character vector. Appendix of the result score's name (e.g. method used to combine batches). Needs to have the same length as metrics or NULL. 
k 
Numeric. Number of knearest neighbours (knn) to use. 
k_min 
Numeric. Minimum number of knn to include
(see 
smooth 
Logical. Indicating if cms results should be smoothened within each neighbourhood using the weigthed mean. Relevant for metric: 'cms'. 
cell_min 
Numeric. Minimum number of cells from each group to be included into the AD test. Should be > 4. Relevant for metric: 'cms'. 
batch_min 
Numeric. Minimum number of cells per batch to include in to the AD test. If set, neighbours will be included until batch_min cells from each batch are present. Relevant for metrics: 'cms'. 
unbalanced 
Boolean. If TRUE, neighbourhoods with only one batch present will be set to NA. This way they are not included into any summaries or smoothening. Relevant for metrics: 'cms'. 
weight 
Boolean. If TRUE, batch probabilities to calculate the isi score are weighted by the mean distance of their cells towards the cell of interest. Relevant for metrics: 'isi'. 
k_pos 
Numeric. Position of cell to be used as reference within mixing
metric. See 
sce_pre_list 
A list of 
dim_combined 
Character. Name of embeddings to use as subspace to
calculate LDF after integration. Default is 
assay_pre 
Character. Name of the assay to use for PCA.
Only relevant if no existing 'dim_red' is provided.
Must be one of 
n_combined 
Number of PCs to use in original space.
See 
BPPARAM 
A BiocParallelParam object specifying whether cms scores shall be calculated in parallel. Relevant for metric: 'cms'. 
evalIntegration is a wrapper function for different metrics to understand results of integrated single cell data sets. In general there are metrics evaluationg the *mixing* of datasets, that is, metrics that show whether there still is a bias for different datasets after integration. Furthermore there are metrics to evaluate how well the dataset internal structure has been retained, that is, metrics that show whether there has been (potentially biological) signal removed or noise added by integration.
A SingleCellExperiment
with the chosen metric's score within
colData.
Here we provide the following metrics:
Cellspecific Mixing Score. Metric that tests the hypothesis
that groupspecific distance distributions of knn cells have the same
underlying unspecified distribution. The score can be interpreted as the
data's probability within an equally mixed neighbourhood according to the
batch variable (see cms
).
Inverse Simpson Index. Metric that uses the Inverse Simpson’s Index to calculate the diversification within a specified neighbourhood. The Simpson index describes the probability that two entities are taken at random from the dataset and its inverse represent the effective number of batches in a neighbourhood. The inverse Simpson index has been proposed as a diversity score for batch mixing in single cell RNAseq by Korunsky et al. They provide a distancebased neighbourhood weightening in their Lisi package.
Mixing Metric. Metric using the median position of the
kth cell from each batch within its knn as a score. The lower the better
mixed is the neighbourhood. We implemented an equivalent version to the
one in the Seurat package (See MixingMetric
and
mixMetric
.)
Shannon entropy. Metric calculating the Shannon entropy of
the batch/group variable within each cell's knearest neigbours.
For balanced batches the entropy is closer to 1 the higher the variables
randomness. For unbalanced batches entropy should only be used as a
relative metric in a comparative setting (See entropy
.)
Local density factor differences. Metric that determines
cellspecific changes in the Local Density Factor before and after data
integration. A metric/difference close to 0 indicates no distortion of
the previous structure (see ldfDiff
).
Local structure. Metric that compares the
intersection of knn from the same batch before and after integration
returning the average between all groups. The higher the more neighbours
were reproduced after integration. Here we implemented an equivalent
version to the one in the Seurat package
(See LocalStruct
and locStructure
).
Korsunsky I Fan J Slowikowski K Zhang F Wei K et. al. (2018). Fast, sensitive, and accurate integration of single cell data with Harmony. bioRxiv (preprint).
Stuart T Butler A Hoffman P Hafemeister C Papalexi E et. al. (2019) Comprehensive Integration of SingleCell Data. Cell.
1 2 3 4 5 6 7 8 9  library(SingleCellExperiment)
sim_list < readRDS(system.file("extdata/sim50.rds", package = "CellMixS"))
sce < sim_list[[1]][, c(1:15, 300:320, 16:30)]
sce_batch1 < sce[,colData(sce)$batch == "1"]
sce_batch2 < sce[,colData(sce)$batch == "2"]
pre < list("1" = sce_batch1, "2" = sce_batch2)
sce < evalIntegration(metrics = c("cms", "mixingMetric", "isi", "entropy"), sce, "batch", k = 20)
sce < evalIntegration("ldfDiff", sce, "batch", k = 20, sce_pre_list = pre)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.