BayesCVIs | R Documentation |
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using an underlying cluster validity index (CVI) and Dirichlet prior parameters of the user's choice. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
BayesCVIs(CVI, n, kmax, opt.pt, alpha = "default", mult.alpha = 1/2)
CVI |
the CVI values for |
n |
a number of data point. |
kmax |
a maximum number of clusters to be considered. |
opt.pt |
a character string indicating whether the maximum or the minimum of |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI is defined as follows.
Let
r_k(\bf x) = \dfrac{\max_j CVI(j)- CVI(k)}{\sum_{i=2}^K (\max_j CVI(j) - CVI(i))}
for a CVI such that the smallest value indicates the optimal number of clusters and
r_k(\bf x) = \dfrac{CVI(k)-\min_j CVI(j)}{\sum_{i=2}^K (CVI(i)-\min_j CVI(j))}
for a CVI such that the largest value indicates the optimal number of clusters.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
opt.pt |
a character string indicating whether the maximum or the minimum of |
Nathakhun Wiroonsri and Onthada Preedasawakul
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.csda.2024.108053")}
B2_data, B_TANG.IDX, B_WP.IDX, B_Wvalid, B_DB.IDX
# install a package for computing an underlying CVI
# install.packages("UniversalCVI")
library(UniversalCVI)
library(BayesCVI)
data = R1_data[,-3]
# Compute WP index by WP.IDX using default gamma
FCM.WP = WP.IDX(scale(data), cmax = 10, cmin = 2, corr = 'pearson', method = 'FCM', fzm = 2,
iter = 100, nstart = 20, NCstart = TRUE)
# WP.IDX values
result = FCM.WP$WP$WPI
aalpha = c(20,20,20,5,5,5,0.5,0.5,0.5)
B.WP = BayesCVIs(CVI = result,
n = nrow(data),
kmax = 10,
opt.pt = "max",
alpha = aalpha,
mult.alpha = 1/2)
# plot the BCVI
pplot = plot_BCVI(B.WP)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.