clusterTree: Density clustering: the cluster tree

View source: R/clusterTree.R

clusterTreeR Documentation

Density clustering: the cluster tree

Description

Given a point cloud, or a matrix of distances, the function clusterTree computes a density estimator and returns the corresponding cluster tree of superlevel sets (lambda tree and kappa tree; see references).

Usage

clusterTree(
    X, k, h = NULL, density = "knn", dist = "euclidean", d = NULL,
    Nlambda = 100, printProgress = FALSE)

Arguments

X

If dist="euclidean", then X is an n by d matrix of coordinates, where n is the number of points stored in X and d is the dimension of the space. If dist="arbitrary", then X is an n by n matrix of distances.

k

an integer value specifying the parameter of the underlying k-nearest neighbor similarity graph, used to determine connected components. If density="knn", then k is also used to compute the k-nearest neighbor density estimator.

h

real value: if density = "kde", then h is used to compute the kernel density estimator with bandwidth h. The default value is NULL.

density

string: if "knn" then the k-nearest neighbor density estimator is used to compute the cluster tree; if "kde" then the kernel density estimator is used to compute the cluster tree. The default value is "knn".

dist

string: can be "euclidean", when X is a point cloud or "arbitrary", when X is a matrix of distances. The default value is "euclidean".

d

integer: if dist="arbitrary", then d is the dimension of the underlying space. The default value is "NULL".

Nlambda

integer: size of the grid of values of the density estimator, used to compute the cluster tree. High Nlambda (i.e. a fine grid) means a more accurate cluster Tree. The default value is 100.

printProgress

logical: if TRUE, a progress bar is printed. The default value is FALSE.

Details

The function clusterTree is an implementation of Algorithm 1 in the first reference.

Value

The function clusterTree returns an object of class clusterTree, a list with the following components

density

Vector of length n: the values of the density estimator evaluated at each of the points stored in X

DataPoints

A list whose elements are the points of X corresponding to each branch, in the same order of id

n

The number of points stored in the input matrix X

id

Vector: the IDs associated to the branches of the cluster tree

children

A list whose elements are the IDs of the children of each branch, in the same order of id

parent

Vector: the IDs of the parents of each branch, in the same order of id

silo

A list whose elements are the horizontal coordinates of the silo of each branch, in the same order of id

Xbase

Vector: the horiontal coordinates of the branches of the cluster tree, in the same order of id

lambdaBottom

Vector: the vertical bottom coordinates of the branches of the lambda tree, in the same order of id

lambdaTop

Vector: the vertical top coordinates of the branches of the lambda tree, in the same order of id

rBottom

(only if density="knn") Vector: the vertical bottom coordinates of the branches of the r tree, in the same order of id

rTop

(only if density="knn") Vector: the vertical top coordinates of the branches of the r tree, in the same order of id

alphaBottom

Vector: the vertical bottom coordinates of the branches of the alpha tree, in the same order of id

alphaTop

Vector: the vertical top coordinates of the branches of the alpha tree, in the same order of id

Kbottom

Vector: the vertical bottom coordinates of the branches of the kappa tree, in the same order of id

Ktop

Vector: the vertical top coordinates of the branches of the kappa tree, in the same order of id

Author(s)

Fabrizio Lecci

References

Kent BP, Rinaldo A, Verstynen T (2013). "DeBaCl: A Python Package for Interactive DEnsity-BAsed CLustering." arXiv:1307.8136

Lecci F, Rinaldo A, Wasserman L (2014). "Metric Embeddings for Cluster Trees"

See Also

plot.clusterTree

Examples

## Generate data: 3 clusters
n <- 1200    #sample size
Neach <- floor(n / 4) 
X1 <- cbind(rnorm(Neach, 1, .8), rnorm(Neach, 5, 0.8))
X2 <- cbind(rnorm(Neach, 3.5, .8), rnorm(Neach, 5, 0.8))
X3 <- cbind(rnorm(Neach, 6, 1), rnorm(Neach, 1, 1))
X <- rbind(X1, X2, X3)

k <- 100     #parameter of knn

## Density clustering using knn and kde
Tree <- clusterTree(X, k, density = "knn")
TreeKDE <- clusterTree(X, k, h = 0.3, density = "kde")

par(mfrow = c(2, 3))
plot(X, pch = 19, cex = 0.6)
# plot lambda trees
plot(Tree, type = "lambda", main = "lambda Tree (knn)")
plot(TreeKDE, type = "lambda", main = "lambda Tree (kde)")
# plot clusters
plot(X, pch = 19, cex = 0.6, main = "cluster labels")
for (i in Tree[["id"]]){
  points(matrix(X[Tree[["DataPoints"]][[i]],],ncol = 2), col = i, pch = 19,
         cex = 0.6)
}
#plot kappa trees
plot(Tree, type = "kappa", main = "kappa Tree (knn)")
plot(TreeKDE, type = "kappa", main = "kappa Tree (kde)")

TDA documentation built on Feb. 16, 2023, 6:35 p.m.