B_DB.IDX | R Documentation |
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using DB and/or DBs as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_DB.IDX(x, kmax, method = "kmeans", indexlist = "all", p = 2, q = 2,
nstart = 100, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
method |
a character string indicating which clustering method to be used ( |
indexlist |
a character string indicating which cluster validity indexes to be computed ( |
p |
the power of the Minkowski distance between centroids of clusters. The default is |
q |
the power of dispersion measure of a cluster. The default is |
nstart |
a maximum number of initial random sets for kmeans for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-DB is defined as follows.
Let
r_k(\bf x) = \dfrac{\max_j CVI(j)- CVI(k)}{\sum_{i=2}^K (\max_j CVI(j) - CVI(i))}.
where CVI indicates DB or DBs index.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
D. L. Davies, D. W. Bouldin, "A cluster separation measure," IEEE Trans Pattern Anal Machine Intell, 1, 224-227 (1979).
M. Kim, R. S. Ramakrishna, "New indices for cluster validity assessment," Pattern Recognition Letters, 26, 2353-2363 (2005).
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.csda.2024.108053")}
B2_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DI.IDX
library(BayesCVI)
# The data included in this package.
data = B2_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.DB = B_DB.IDX(x = scale(data), kmax=10, method = "kmeans", indexlist = "all",
p = 2, q = 2, nstart = 100, alpha = "default", mult.alpha = 1/2)
# plot the BCVI-DB
pplot = plot_BCVI(B.DB$DB)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
# plot the BCVI-DBs
pplot = plot_BCVI(B.DB$DBs)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.