bdm.optk.s2nr | R Documentation |
Performs a recursive merging of clusters based on minimum loss of signal-to-noise-ratio (S2NR). The S2NR is the explained/unexplained variance ratio measured in the high dimensional space based on the given low dimensional clustering. Merging is applied recursively until reaching a configuration of only 2 clusters and the S2NR is measured at each step.
bdm.optk.s2nr(data, bdm, info = T, plot.optk = T, ret.optk = F, layer = 1)
data |
Input data (a matrix, a big.matrix or a .csv file name). |
bdm |
A clustered bdm instance (i.e. all up-stream steps performed: |
info |
Logical value. If TRUE, all merging steps are shown (default value is |
plot.optk |
Logical value. If TRUE, this function plots the heuristic measure versus the number of clusters (default value is |
ret.optk |
Logical value. For large datasets this computation can take a while and it might be interesting to save it. If TRUE, the function returns a copy of the bdm instance with the values of S2NR attached as bdm$optk (default value is |
layer |
The bdm$ptsne layer to be used (default value is |
The underlying idea is that neigbouring clusters in the embedding correspond to close clusters in the high dimensional space, i.e. this merging heuristic is based on the spatial distribution of clusters. For each cluster (child cluster) we choose the neighboring cluster with steepest gradient along their common border (father cluster). Thus, we get a set of pairs of clusters (child/father) to be potentially merged. Given this set of candidates, the merging is performed recursively choosing, at each step, the pair of child/father clusters that results in a minimum loss of S2NR.
Typically some clusters dominate over all of their neighboring clusters. These clusters have no father. Thus, once all posible mergings have been performed we reach a blocked state where only the dominant clusters remain. This situation identifies a hierarchy level in the clustering. When this situation is reached, the algorithm starts a new merging round, identifying the child/father relations at that level of the hierarchy. The process stops when only two clusters remain.
Usually, the clustering hierarchy is clearly depicted by singular points in the S2NR function. This is a hint that the low dimensional clustering configuration is an image of a hierarchycal configuration in the high dimensional space. See bdm.optk.plot()
.
None if ret.optk = FALSE
. Else, a copy of the input bdm instance with new element bdm$optk (a matrix).
# --- load mapped dataset
bdm.example()
# --- compute optimal number of clusters and attach the computation
bdm.optk.s2nr(ex$map, data = ex$data, plot.optk = TRUE, ret.optk = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.