View source: R/nestedClusters.R
| nestedClusters | R Documentation |
Map an alternative clustering to a reference clustering, where the latter is expected to be nested within the former.
nestedClusters(ref, alt)
ref |
A character vector or factor containing one set of groupings, considered to be the reference. |
alt |
A character vector or factor containing another set of groupings, to be compared to |
This function identifies mappings between two clusterings on the same set of cells where alt is potentially nested within ref (e.g., as it is computed at higher resolution).
To do so, we take each alt cluster and compute the the proportion of its cells that are derived from each ref cluster.
The corresponding ref cluster is identified as that with the highest proportion, as reported by the which field in the mapping DataFrame.
The quality of the mapping is determined by max in the output mapping DataFrame.
A low value indicates that alt does not have a clear counterpart in ref, representing loss of heterogeneity.
Note that this is not a symmetrical inference; multiple alt clusters can map to the same ref cluster without manifesting as a low max.
This implicitly assumes that an increase in resolution in alt is not problematic.
The ref.score value for each cluster ref is formally defined as the probability of randomly picking a cell that belongs to ref,
conditional on the event that the chosen cell belongs to the same alt cluster as a randomly chosen cell from ref.
This probability is equal to unity when ref is an exact superset of all alt clusters that contain its cells, corresponding to perfect 1:many nesting.
In contrast, if the alt clusters contain a mix of cells from different ref, this probability will be low and can be used as a diagnostic for imperfect nesting.
A list containing:
proportions, a matrix where each row corresponds to one of the alt clusters and each column corresponds to one of the ref clusters.
Each matrix entry represents the proportion of cells in alt that are assigned to each cluster in ref.
(That is, the proportions across all ref clusters should sum to unity for each alt cluster.)
alt.mapping, a DataFrame with one row per cluster in alt.
This contains the columns max, a numeric vector specifying the maximum value of statistic for that alt cluster;
and which, a character vector specifying the ref cluster in which the maximum value occurs.
ref.score, a numeric vector of length equal to the number of ref clusters.
This represents the degree of nesting of alt clusters within each ref cluster, see Details.
linkClusters, to do this in a symmetric manner (i.e., without nesting).
pairwiseRand, for another way of comparing two sets of clusterings.
m <- matrix(runif(10000), ncol=10)
clust1 <- kmeans(m,10)$cluster
clust2 <- kmeans(m,20)$cluster
nestedClusters(clust1, clust2)
# The ref.score is 1 in cases of perfect nesting.
nestedClusters(clust1, clust1)$ref.score
nest.clust <- paste0(clust1, sample(letters, length(clust1), replace=TRUE))
nestedClusters(clust1, nest.clust)$ref.score
# In contrast, it is much lower when nesting is bad.
nestedClusters(clust1, sample(clust1))$ref.score
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.