B_GC.IDX | R Documentation |
Compute Bayesian cluster validity index (BCVI) from two to kmax
groups using all or part of GC1 GC2 GC3 and GC4 as the underling cluster validity index (CVI) with the user's selected Dirichlet prior parameters. The full detail of BCVI can be found in the paper Wiroonsri and Preedasawakul (2024).
B_GC.IDX(x, kmax, indexlist = "all", method = "FCM", fzm = 2, iter = 100,
nstart = 20, alpha = "default", mult.alpha = 1/2)
x |
a numeric data frame or matrix where each column is a variable to be used for cluster analysis and each row is a data point. |
kmax |
a maximum number of clusters to be considered. |
indexlist |
a character string indicating which The generalized C index be computed (" |
method |
a character string indicating which clustering method to be used ( |
fzm |
a number greater than 1 giving the degree of fuzzification for |
iter |
a maximum number of iterations for |
nstart |
a maximum number of initial random sets for FCM for |
alpha |
Dirichlet prior parameters |
mult.alpha |
the power |
BCVI-GC is defined as follows.
Let
r_k(\bf x) = \dfrac{\max_j CVI(j)- CVI(k)}{\sum_{i=2}^K (\max_j CVI(j) - CVI(i))}.
where CVI is one of the GC1 GC2 GC3 or GC4 index.
Assume that
f({\bf x}|{\bf p}) = C({\bf p}) \prod_{k=2}^Kp_k^{nr_k(x)}
represents the conditional probability density function of the dataset given \bf p
, where C({\bf p})
is the normalizing constant. Assume further that {\bf p}
follows a Dirichlet prior distribution with parameters {\bm \alpha} = (\alpha_2,\ldots,\alpha_K)
. The posterior distribution of \bf p
still remains a Dirichlet distribution with parameters (\alpha_2+nr_2({\bf x}),\ldots,\alpha_K+nr_K({\bf x}))
.
The BCVI is then defined as
BCVI(k) = E[p_k|{\bf x}] = \frac{\alpha_k + nr_k({\bf x})}{\alpha_0+n}
where \alpha_0 = \sum_{k=2}^K \alpha_k.
The variance of p_k
can be computed as
Var(p_k|{\bf x}) = \dfrac{(\alpha_k + nr_k(x))(\alpha_0 + n -\alpha_k-nr_k(x))}{(\alpha_0 + n)^2(\alpha_0 + n +1 )}.
BCVI |
the dataframe where the first and the second columns are the number of groups |
VAR |
the data frame where the first and the second columns are the number of groups |
CVI |
the data frame where the first and the second columns are the number of groups |
Nathakhun Wiroonsri and Onthada Preedasawakul
J. C. Bezdek, M. Moshtaghi, T. Runkler, and C. Leckie, “The generalized c index for internal fuzzy cluster validity,” IEEE Transactions on Fuzzy Systems, vol. 24, no. 6, pp. 1500–1512, 2016. https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7429723&isnumber=7797168
O. Preedasawakul, and N. Wiroonsri, A Bayesian Cluster Validity Index, Computational Statistics & Data Analysis, 202, 108053, 2025. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.csda.2024.108053")}
B7_data, B_TANG.IDX, B_XB.IDX, B_Wvalid, B_DB.IDX
library(BayesCVI)
# The data included in this package.
data = B7_data[,1:2]
# alpha
aalpha = c(5,5,5,20,20,20,0.5,0.5,0.5)
B.GC = B_GC.IDX(x = scale(data), kmax = 10, indexlist = "GC1",
method = "FCM", fzm = 2, iter = 100,
nstart = 20, alpha = aalpha, mult.alpha = 1/2)
# plot the BCVI-GC1
pplot = plot_BCVI(B.GC$GC1)
pplot$plot_index
pplot$plot_BCVI
pplot$error_bar_plot
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.