intersectClusters: Intersect pre-defined clusters

View source: R/intersectClusters.R

intersectClustersR Documentation

Intersect pre-defined clusters

Description

Intersect pre-defined clusters from multiple modalities, pruning out combinations that are poorly separated based on the within-cluster sum of squares (WCSS).

Usage

intersectClusters(clusters, coords, scale = 1, BPPARAM = SerialParam())

Arguments

clusters

A list of factors or vectors of the same length. Each element corresponds to one modality and contains the cluster assignments for the same set of cells.

coords

A list of matrices of length equal to clusters. Each element should have number of rows equal to the number of cells (e.g., a matrix of PC coordinates); we generally expect this to have been used to generate the corresponding entry of clusters.

scale

Numeric scalar specifying the scaling factor to apply to the limit on the WCSS for each modality.

BPPARAM

A BiocParallelParam object specifying how parallelization should be performed.

Details

We intersect clusters by only considering two cells to be in the same “output” cluster if they are also clustered together in each modality. In other words, all cells with a particular combination of identities in clusters are assigned to a separate output cluster.

The simplest implementation of the above idea suffers from noise in the cluster definitions that introduces combinations with very few cells. We eliminate these by greedily merging pairs of combinations, starting with the pairs that minimize the gain in the WCSS. In this process, we only consider pairs of combinations that share at least cluster across all modalities (to avoid merges across unrelated clusters).

A natural stopping point for this merging process is when the WCSS of the output clustering exceeds the WCSS of the original clustering for any modality. This aims to preserve the original clustering in each modality by preventing overly aggressive merges that would greatly increase the WCSS, while reducing the complexity of the output clustering by ensuring that the variance explained is comparable.

Users can increase the aggressiveness of the merging procedure by increasing scale, e.g., to 1.05 or 1. This will scale up the limit on the WCSS, allowing more merges to be performed before termination.

Value

An integer vector of length equal to the number of cells, containing the assignments to the output clusters.

Author(s)

Aaron Lun

Examples

mat1 <- matrix(rnorm(10000), ncol=20)
chosen <- 1:250
mat1[chosen,1] <- mat1[chosen,1] + 10
clusters1 <- kmeans(mat1, 5)$cluster
table(clusters1, chosen=mat1[,1] > 5)

# Pretending we have some other data for the same cells, e.g., ADT.
mat2 <- matrix(rnorm(10000), ncol=20)
chosen <- c(1:125, 251:375)
mat2[chosen,2] <- mat2[chosen,2] + 10
clusters2 <- kmeans(mat2, 5)$cluster
table(clusters2, mat2[,2] > 5)

# Intersecting the clusters:
clusters3 <- intersectClusters(list(clusters1, clusters2), list(mat1, mat2))
table(clusters3, mat1[,1] > 5)
table(clusters3, mat2[,2] > 5)


LTLA/mumosa documentation built on March 10, 2024, 1:20 a.m.