| dbcv | R Documentation |
Calculate the Density-Based Clustering Validation Index (DBCV) for a clustering.
dbcv(x, cl, d, metric = "euclidean", sample = NULL)
x |
a data matrix or a dist object. |
cl |
a clustering (e.g., a integer vector) |
d |
dimensionality of the original data if a dist object is provided. |
metric |
distance metric used. The available metrics are the methods
implemented by |
sample |
sample size used for large datasets. |
DBCV (Moulavi et al, 2014) computes a score based on the density sparseness of each cluster and the density separation of each pair of clusters.
The density sparseness of a cluster (DSC) is defined as the maximum edge weight of a minimal spanning tree for the internal points of the cluster using the mutual reachability distance based on the all-points-core-distance. Internal points are connected to more than one other point in the cluster. Since clusters of a size less then 3 cannot have internal points, they are ignored (considered noise) in this implementation.
The density separation of a pair of clusters (DSPC) is defined as the minimum reachability distance between the internal nodes of the spanning trees of the two clusters.
The validity index for a cluster is calculated using these measures and aggregated to a validity index for the whole clustering using a weighted average.
The index is in the range [-1,1]. If the cluster density compactness is better
than the density separation, a positive value is returned. The actual value depends
on the separability of the data. In general, greater values
of the measure indicating a better density-based clustering solution.
Noise points are included in the calculation only in the weighted average, therefore clustering with more noise points will get a lower index.
Performance note: This implementation calculates a distance matrix and thus can only be used for small or sampled datasets.
A list with the DBCV score for the clustering,
the density sparseness of cluster (dsc) values,
the density separation of pairs of clusters (dspc) distances,
and the validity indices of clusters (c_c).
Matt Piekenbrock and Michael Hahsler
Davoud Moulavi and Pablo A. Jaskowiak and Ricardo J. G. B. Campello and Arthur Zimek and Jörg Sander (2014). Density-Based Clustering Validation. In Proceedings of the 2014 SIAM International Conference on Data Mining, pages 839-847 \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1137/1.9781611973440.96")}
Pablo A. Jaskowiak (2022). MATLAB implementation of DBCV. https://github.com/pajaskowiak/dbcv
# Load a test dataset
data(Dataset_1)
x <- Dataset_1[, c("x", "y")]
class <- Dataset_1$class
clplot(x, class)
# We use MinPts 3 and use the knee at eps = .1 for dbscan
kNNdistplot(x, minPts = 3)
cl <- dbscan(x, eps = .1, minPts = 3)
clplot(x, cl)
dbcv(x, cl)
# compare to the DBCV index on the original class labels and
# with a random partitioning
dbcv(x, class)
dbcv(x, sample(1:4, replace = TRUE, size = nrow(x)))
# find the best eps using dbcv
eps_grid <- seq(.05,.2, by = .01)
cls <- lapply(eps_grid, FUN = function(e) dbscan(x, eps = e, minPts = 3))
dbcvs <- sapply(cls, FUN = function(cl) dbcv(x, cl)$score)
plot(eps_grid, dbcvs, type = "l")
eps_opt <- eps_grid[which.max(dbcvs)]
eps_opt
cl <- dbscan(x, eps = eps_opt, minPts = 3)
clplot(x, cl)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.