Bk_permutations: Bk permutation - Calculating Fowlkes-Mallows Index for two...

View source: R/bk_method.R

Bk_permutationsR Documentation

Bk permutation - Calculating Fowlkes-Mallows Index for two dendrogram

Description

Bk is the calculation of Fowlkes-Mallows index for a series of k cuts for two dendrograms.

Bk permutation calculates the Bk under the null hypothesis of no similarirty between the two trees by randomally shuffling the labels of the two trees and calculating their Bk.

Usage

Bk_permutations(
  tree1,
  tree2,
  k,
  R = 1000,
  warn = dendextend_options("warn"),
  ...
)

Arguments

tree1

a dendrogram/hclust/phylo object.

tree2

a dendrogram/hclust/phylo object.

k

an integer scalar or vector with the desired number of cluster groups. If missing - the Bk will be calculated for a default k range of 2:(nleaves-1). No point in checking k=1/k=n, since both will give Bk=1.

R

integer (Default is 1000). The number of Bk permutation to perform for each k.

warn

logical (default from dendextend_options("warn") is FALSE). Set if warning are to be issued, it is safer to keep this at TRUE, but for keeping the noise down, the default is FALSE. If set to TRUE, extra checks are made to varify that the two clusters have the same size and the same labels.

...

Ignored (passed to FM_index_R).

Details

From Wikipedia:

Fowlkes-Mallows index (see references) is an external evaluation method that is used to determine the similarity between two clusterings (clusters obtained after a clustering algorithm). This measure of similarity could be either between two hierarchical clusterings or a clustering and a benchmark classification. A higher the value for the Fowlkes-Mallows index indicates a greater similarity between the clusters and the benchmark classifications.

Value

A list (of the length of k's), where each element of the list has R (number of permutations) calculations of Fowlkes-Mallows index between two dendrogram after having their labels shuffled.

The names of the lists' items is the k for which it was calculated.

References

Fowlkes, E. B.; Mallows, C. L. (1 September 1983). "A Method for Comparing Two Hierarchical Clusterings". Journal of the American Statistical Association 78 (383): 553.

https://en.wikipedia.org/wiki/Fowlkes-Mallows_index

See Also

FM_index, Bk

Examples


## Not run: 

set.seed(23235)
ss <- TRUE # sample(1:150, 10 )
hc1 <- hclust(dist(iris[ss, -5]), "com")
hc2 <- hclust(dist(iris[ss, -5]), "single")
# tree1 <- as.treerogram(hc1)
# tree2 <- as.treerogram(hc2)
#    cutree(tree1)

some_Bk <- Bk(hc1, hc2, k = 20)
some_Bk_permu <- Bk_permutations(hc1, hc2, k = 20)

# we can see that the Bk is much higher than the permutation Bks:
plot(
  x = rep(1, 1000), y = some_Bk_permu[[1]],
  main = "Bk distribution under H0",
  ylim = c(0, 1)
)
points(1, y = some_Bk, pch = 19, col = 2)

## End(Not run)

dendextend documentation built on Oct. 13, 2024, 1:06 a.m.