B_DB.IDX: BCVI-Davies–Bouldin (DB) and DB* (DBs) indexes
In BayesCVI: Bayesian Cluster Validity Index

B_DB.IDX

R Documentation

BCVI-Davies–Bouldin (DB) and DB* (DBs) indexes

Description

Compute Bayesian cluster validity index (BCVI) from two to kmax groups using DB and/or DBs as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).

Usage

B_DB.IDX(x, kmax, method = "kmeans", indexlist = "all", p = 2, q = 2,
        nstart = 100, alpha = "default", mult.alpha = 1/2)

Arguments

`x`	a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point.
`kmax`	a maximum number of clusters to be considered.
`method`	a character string indicating which clustering method to be used (`"kmeans"`, `"hclust_complete"`, `"hclust_average"`, `"hclust_single"`). The default is `"kmeans"`.
`indexlist`	a character string indicating which cluster validity indexes to be computed (`"all"`, `"DB"`, `"DBs"`). More than one indexes can be selected.
`p`	the power of the Minkowski distance between centroids of clusters. The default is `2`.
`q`	the power of dispersion measure of a cluster. The default is `2`.
`nstart`	a maximum number of initial random sets for kmeans for `method = "kmeans"`. The default is `100`.
`alpha`	Dirichlet prior parameters `\alpha_2,...,\alpha_k` where `\alpha_k` is the parameter corresponding to "the probability of having k groups" (selecting each `\alpha_k` between 0 to 30 is recommended and using the other parameter `mult.alpha` to be its multiplier. The default is `"default"`.)
`mult.alpha`	the power `s` from `n^s` to be multiplied to the Dirichlet prior parameters `alpha` (selecting `mult.alpha` in `[0,1)` is recommended). The default is `\frac{1}{2}`.

Details

BCVI-DB is defined as follows.

Let

r_k(\bf x) = \dfrac{\max_j CVI(j)- CVI(k)}{\sum_{i=2}^K (\max_j CVI(j) - CVI(i))}.

where CVI indicates DB or DBs index.
Assume that

f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}

represents the conditional probability density function of the dataset given \bf p, where C({\bf p}) is the normalizing constant. Assume further that {\bf p} follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K). The posterior distribution of \bf p still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x})).

The BCVI is then defined as

BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}

where \alpha_0 = \sum_{k=2}^K \alpha_k.

The variance of p_k can be computed as

Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.

Value

`BCVI`	the dataframe where the first and the second columns are the number of groups `k` and BCVI`(k)`, respectively, for `k` from `2` to `kmax`.
`VAR`	the data frame where the first and the second columns are the number of groups `k` and the variance of `p_k`, respectively, for `k` from `2` to `kmax`.
`CVI`	the data frame where the first and the second columns are the number of groups `k` and the original DB`(k)` or DBs`(k)`, respectively, for `k` from `2` to `kmax`.

Author(s)

Nathakhun Wiroonsri and Onthada Preedasawakul

References

D. L. Davies, D. W. Bouldin, "A cluster separation measure," IEEE Trans Pattern Anal Machine Intell, 1, 224-227 (1979).

M. Kim, R. S. Ramakrishna, "New indices for cluster validity assessment," Pattern Recognition Letters, 26, 2353-2363 (2005).

O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.csda.2024.108053")}

Examples

library(BayesCVI)

# The data included in this package.
data = B2_data[,1:2]

# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)

B.DB = B_DB.IDX(x = scale(data), kmax=10, method = "kmeans", indexlist = "all",
              p = 2, q = 2, nstart = 100, alpha = "default", mult.alpha = 1/2)

# plot the BCVI-DB

pplot = plot_BCVI(B.DB$DB)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot

# plot the BCVI-DBs

pplot = plot_BCVI(B.DB$DBs)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot

BayesCVI documentation built on Sept. 11, 2024, 8:28 p.m.