nestedClusters: Map nested clusterings
In LTLA/bluster: Clustering Algorithms for Bioconductor

nestedClusters

R Documentation

Map nested clusterings

Description

Map an alternative clustering to a reference clustering, where the latter is expected to be nested within the former.

Usage

nestedClusters(ref, alt)

Arguments

`ref`	A character vector or factor containing one set of groupings, considered to be the reference.
`alt`	A character vector or factor containing another set of groupings, to be compared to `alt`.

Details

This function identifies mappings between two clusterings on the same set of cells where alt is potentially nested within ref (e.g., as it is computed at higher resolution). To do so, we take each alt cluster and compute the the proportion of its cells that are derived from each ref cluster. The corresponding ref cluster is identified as that with the highest proportion, as reported by the which field in the mapping DataFrame.

The quality of the mapping is determined by max in the output mapping DataFrame. A low value indicates that alt does not have a clear counterpart in ref, representing loss of heterogeneity. Note that this is not a symmetrical inference; multiple alt clusters can map to the same ref cluster without manifesting as a low max. This implicitly assumes that an increase in resolution in alt is not problematic.

The ref.score value for each cluster ref is formally defined as the probability of randomly picking a cell that belongs to ref, conditional on the event that the chosen cell belongs to the same alt cluster as a randomly chosen cell from ref. This probability is equal to unity when ref is an exact superset of all alt clusters that contain its cells, corresponding to perfect 1:many nesting. In contrast, if the alt clusters contain a mix of cells from different ref, this probability will be low and can be used as a diagnostic for imperfect nesting.

Value

A list containing:

proportions, a matrix where each row corresponds to one of the alt clusters and each column corresponds to one of the ref clusters. Each matrix entry represents the proportion of cells in alt that are assigned to each cluster in ref. (That is, the proportions across all ref clusters should sum to unity for each alt cluster.)
alt.mapping, a DataFrame with one row per cluster in alt. This contains the columns max, a numeric vector specifying the maximum value of statistic for that alt cluster; and which, a character vector specifying the ref cluster in which the maximum value occurs.
ref.score, a numeric vector of length equal to the number of ref clusters. This represents the degree of nesting of alt clusters within each ref cluster, see Details.

Examples

m <- matrix(runif(10000), ncol=10)
clust1 <- kmeans(m,10)$cluster
clust2 <- kmeans(m,20)$cluster
nestedClusters(clust1, clust2)

# The ref.score is 1 in cases of perfect nesting.
nestedClusters(clust1, clust1)$ref.score

nest.clust <- paste0(clust1, sample(letters, length(clust1), replace=TRUE))
nestedClusters(clust1, nest.clust)$ref.score

# In contrast, it is much lower when nesting is bad.
nestedClusters(clust1, sample(clust1))$ref.score

LTLA/bluster documentation built on Sept. 8, 2024, 4:37 a.m.