linkClusters: Create a graph between different clusterings

View source: R/linkClusters.R

linkClustersR Documentation

Create a graph between different clusterings

Description

Create a graph that links together clusters from different clusterings, e.g., generated using different parameter settings or algorithms. This is useful for identifying corresponding clusters between clusterings and to create meta-clusters from multiple clusterings.

Usage

linkClusters(clusters, prefix = TRUE, denominator = c("union", "min", "max"))

linkClustersMatrix(x, y, denominator = c("union", "min", "max"))

Arguments

clusters

A list of factors or vectors where each entry corresponds to a clustering. All vectors should be of the same length. The list itself should usually be named with a suitable label for each clustering.

prefix

Logical scalar indicating whether the cluster levels should be prefixed with its clustering. If clusters is not named, numeric prefixes are used instead.

denominator

String specifying how the strength of the correspondence between clusters should be computed.

x, y

Factor or vector specifying a clustering of the same cells.

Details

Links are only formed between clusters from different clusterings, e.g., between clusters X in clustering 1 and Y in clustering 2. The edge weight of each link is set to the strength of the correspondence between the two clusters; this is defined from the number of cells with those two labels in their respective clusterings. A larger number of cells indicates that X and Y are corresponding clusters.

Of course, the number of cells also depends on the total number of cells in each cluster. To account for this, we normalize the strength by a function of the total number of cells in the two clusters. The choice of function is determined by denominator and determines how the strength is adjusted for dissimilar cluster sizes.

  • For "min", the number of shared cells is divided by the smaller of the totals between the two clusters.

  • For "max", the number of shared cells is divided by the larger of the totals.

  • For "union", the number of shared cells is divided by the size of the union of cells in the two clusters. The result is equivalent to the Jaccard index.

In situations where X splits into multiple smaller clusters Y1, Y2, etc. in another clustering, denominator="min" will report strong links between X and its constituent subclusters while "max" and "union" will report weak links. Conversely, denominator="max" and "union" can only form strong links when there is a 1:1 mapping between clusters in different clusterings. This usually yields simpler correspondences between clusterings at the cost of orphaning some of the smaller subclusters. denominator="union" is most stringent as it will penalize the presence of non-shared cells in both clusters, whereas "max" only does so for the larger cluster.

The general idea is to use the graph returned by this function in visualization routines or for community-based clustering, to identify “clusters of clusters” that can inform about the relationships between clusterings.

Value

For linkClusters, a graph object where each node is a cluster level in one of the clusterings in clusters. Edges are weighted by the strength of the correspondence between two clusters in different clusterings.

For linkClustersMatrix, a matrix is returned where each row and column corresponds to a cluster in x and y, respectively. Entries represent the strength of the correspondence between the associated clusters; this is equivalent to a submatrix of the adjacency matrix from the graph returned by linkClusters.

Author(s)

Aaron Lun

See Also

The clustree package, which provides another method for visualizing relationships between clusterings.

compareClusterings, which computes similarities between the clusterings themselves.

Examples

clusters <- list(
    nngraph = clusterRows(iris[,1:4], NNGraphParam()),
    hclust = clusterRows(iris[,1:4], HclustParam(cut.dynamic=TRUE)),
    kmeans = clusterRows(iris[,1:4], KmeansParam(5))
)

g <- linkClusters(clusters)
plot(g)

igraph::cluster_walktrap(g)

# Results as a matrix, for two clusterings:
linkClustersMatrix(clusters[[1]], clusters[[2]], denominator="union")

LTLA/bluster documentation built on Sept. 8, 2024, 4:37 a.m.