bdm.optk.s2nr: Find optimal number of clusters based on...

View source: R/bdm_merge.R

bdm.optk.s2nrR Documentation

Find optimal number of clusters based on signal-to-noise-ratio.

Description

Performs a recursive merging of clusters based on minimum loss of signal-to-noise-ratio (S2NR). The S2NR is the explained/unexplained variance ratio measured in the high dimensional space based on the given low dimensional clustering. Merging is applied recursively until reaching a configuration of only 2 clusters and the S2NR is measured at each step.

Usage

bdm.optk.s2nr(data, bdm, info = T, plot.optk = T, ret.optk = F, layer = 1)

Arguments

data

Input data (a matrix, a big.matrix or a .csv file name).

bdm

A clustered bdm instance (i.e. all up-stream steps performed: bdm.ptse(), bdm.pakde() and bdm.wtt().

info

Logical value. If TRUE, all merging steps are shown (default value is info = FALSE).

plot.optk

Logical value. If TRUE, this function plots the heuristic measure versus the number of clusters (default value is plot.optk = TRUE)

ret.optk

Logical value. For large datasets this computation can take a while and it might be interesting to save it. If TRUE, the function returns a copy of the bdm instance with the values of S2NR attached as bdm$optk (default value is ret.optk = FALSE).

layer

The bdm$ptsne layer to be used (default value is layer = 1).

Details

The underlying idea is that neigbouring clusters in the embedding correspond to close clusters in the high dimensional space, i.e. this merging heuristic is based on the spatial distribution of clusters. For each cluster (child cluster) we choose the neighboring cluster with steepest gradient along their common border (father cluster). Thus, we get a set of pairs of clusters (child/father) to be potentially merged. Given this set of candidates, the merging is performed recursively choosing, at each step, the pair of child/father clusters that results in a minimum loss of S2NR. Typically some clusters dominate over all of their neighboring clusters. These clusters have no father. Thus, once all posible mergings have been performed we reach a blocked state where only the dominant clusters remain. This situation identifies a hierarchy level in the clustering. When this situation is reached, the algorithm starts a new merging round, identifying the child/father relations at that level of the hierarchy. The process stops when only two clusters remain. Usually, the clustering hierarchy is clearly depicted by singular points in the S2NR function. This is a hint that the low dimensional clustering configuration is an image of a hierarchycal configuration in the high dimensional space. See bdm.optk.plot().

Value

None if ret.optk = FALSE. Else, a copy of the input bdm instance with new element bdm$optk (a matrix).

Examples


# --- load mapped dataset
bdm.example()
# --- compute optimal number of clusters and attach the computation
bdm.optk.s2nr(ex$map, data = ex$data, plot.optk = TRUE, ret.optk = FALSE)

jgarriga65/bigMap documentation built on June 10, 2024, 7:05 a.m.