https://cran.r-project.org/web/packages/erhcv/index.html
Assesses the statistical significance of clusters for a given dataset through bootstrapping and hypothesis testing of a given matrix of empirical Spearman's rho, based on the technique of S. Gaiser et al. (2010).
In this package, a tree structure consists of nested lists, which is very natural. To explain the concept, let us use the following custom data.tree object:
1 (O)
2 ¦--(O,1)
3 ¦ ¦--(O,1,1)
4 ¦ ¦ ¦--7
5 ¦ ¦ °--8
6 ¦ ¦--6
7 ¦ ¦--5
8 ¦ °--(O,1,4)
9 ¦ ¦--9
10 ¦ °--10
11 ¦--2
12 ¦--1
13 °--(O,4)
14 ¦--3
15 °--4
The associated tree structure is then constructed as follows.
treeStructure <- list(list(list(7, 8), 6, 5, list(9, 10)), 2, 1, list(3, 4))
Note that one can transform this tree into a data.tree object using tree2plot.
tree2plot(treeStructure, plot = FALSE) # data.tree
As a general guideline, a tree structure needs to be a list. Then, one distinguishes nodes from leafs by object types. Indeed, nodes are lists, and leafs are integers. The construction is analogous to the structure construction of the package nCopula.
Function | Description ----------| ------------- VerifyTree | Main function of the package, used to clean raw hclust clustering hclust2tree | Transforms a hclust object into a tree structure (under the convention of this package) tree2plot | Illustration of a tree (or data.tree structure)
For a proper demonstration of the package, we use the package nCopula to sample hierarchical data.
install.packages(erhcv); install.packages(nCopula)
library(nCopula)
## Build structure
structure <- GEO(0.5, 1:2, list(GAMMA(1/2, 3:4, NULL),
GEO(0.3, 5:6, list(GAMMA(1/3, 7:8, NULL),
GAMMA(1/3, 9:10, NULL)))))
## Sample from the structure
set.seed(123)
U.. <- rCompCop(1000, structure)
## Compute Spearman correlation matrix
Spearman <- cor(U.., method = "sp")
## Cluster Spearman matrix
distance <- dist(Spearman, method = "maximum")
clustering <- hclust(distance, method = "average")
## Transform clustering into nested lists
tree <- erhcv::hclust2tree(clustering)
erhcv::tree2plot(tree, structure = TRUE) # data.tree object
1 (O)
2 ¦--(O,1)
3 ¦ ¦--(O,1,1)
4 ¦ ¦ ¦--7
5 ¦ ¦ °--8
6 ¦ °--(O,1,2)
7 ¦ ¦--6
8 ¦ °--(O,1,2,2)
9 ¦ ¦--5
10 ¦ °--(O,1,2,2,2)
11 ¦ ¦--9
12 ¦ °--10
13 °--(O,2)
14 ¦--2
15 °--(O,2,2)
16 ¦--1
17 °--(O,2,2,2)
18 ¦--3
19 °--4
We rapidly see that the obtained structure is far from the original one. We then use tools from erhcv to eliminate unnecessary nodes, based on our (subjective) simplification level alpha.
Here, we make use of VerifyTree to chop down nodes of the clustering we obtained earlier.
## Clean the tree
alpha <- 1 # Severe simplification parameter
cleanedTree <- erhcv::VerifyTree(U.., alpha = alpha,
distance.method = "maximum", hclust.method = "average")$Tree
## Visualize output
erhcv::tree2plot(cleanedTree, structure = TRUE)
1 (O)
2 ¦--(O,1)
3 ¦ ¦--(O,1,1)
4 ¦ ¦ ¦--7
5 ¦ ¦ °--8
6 ¦ ¦--6
7 ¦ ¦--5
8 ¦ °--(O,1,4)
9 ¦ ¦--9
10 ¦ °--10
11 ¦--2
12 ¦--1
13 °--(O,4)
14 ¦--3
15 °--4
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.