View source: R/nestedClusters.R
nestedClusters | R Documentation |
Map an alternative clustering to a reference clustering, where the latter is expected to be nested within the former.
nestedClusters(ref, alt)
ref |
A character vector or factor containing one set of groupings, considered to be the reference. |
alt |
A character vector or factor containing another set of groupings, to be compared to |
This function identifies mappings between two clusterings on the same set of cells where alt
is potentially nested within ref
(e.g., as it is computed at higher resolution).
To do so, we take each alt
cluster and compute the the proportion of its cells that are derived from each ref
cluster.
The corresponding ref
cluster is identified as that with the highest proportion, as reported by the which
field in the mapping
DataFrame.
The quality of the mapping is determined by max
in the output mapping
DataFrame.
A low value indicates that alt
does not have a clear counterpart in ref
, representing loss of heterogeneity.
Note that this is not a symmetrical inference; multiple alt
clusters can map to the same ref
cluster without manifesting as a low max
.
This implicitly assumes that an increase in resolution in alt
is not problematic.
The ref.score
value for each cluster ref
is formally defined as the probability of randomly picking a cell that belongs to ref
,
conditional on the event that the chosen cell belongs to the same alt
cluster as a randomly chosen cell from ref
.
This probability is equal to unity when ref
is an exact superset of all alt
clusters that contain its cells, corresponding to perfect 1:many nesting.
In contrast, if the alt
clusters contain a mix of cells from different ref
, this probability will be low and can be used as a diagnostic for imperfect nesting.
A list containing:
proportions
, a matrix where each row corresponds to one of the alt
clusters and each column corresponds to one of the ref
clusters.
Each matrix entry represents the proportion of cells in alt
that are assigned to each cluster in ref
.
(That is, the proportions across all ref
clusters should sum to unity for each alt
cluster.)
alt.mapping
, a DataFrame with one row per cluster in alt
.
This contains the columns max
, a numeric vector specifying the maximum value of statistic
for that alt
cluster;
and which
, a character vector specifying the ref
cluster in which the maximum value occurs.
ref.score
, a numeric vector of length equal to the number of ref
clusters.
This represents the degree of nesting of alt
clusters within each ref
cluster, see Details.
linkClusters
, to do this in a symmetric manner (i.e., without nesting).
pairwiseRand
, for another way of comparing two sets of clusterings.
m <- matrix(runif(10000), ncol=10)
clust1 <- kmeans(m,10)$cluster
clust2 <- kmeans(m,20)$cluster
nestedClusters(clust1, clust2)
# The ref.score is 1 in cases of perfect nesting.
nestedClusters(clust1, clust1)$ref.score
nest.clust <- paste0(clust1, sample(letters, length(clust1), replace=TRUE))
nestedClusters(clust1, nest.clust)$ref.score
# In contrast, it is much lower when nesting is bad.
nestedClusters(clust1, sample(clust1))$ref.score
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.