GIBcont | R Documentation |
The GIBcont
function implements the Generalised Information Bottleneck (GIB) algorithm
for fuzzy clustering of continuous data. This method optimizes an information-theoretic objective to
preserve relevant information while forming concise and interpretable cluster representations
\insertCitestrouse_ib_2019IBclust.
GIBcont(X, ncl, beta, alpha, randinit = NULL, s = -1, scale = TRUE,
maxiter = 100, nstart = 100,
verbose = FALSE)
X |
A numeric matrix or data frame containing the continuous data to be clustered. All variables should be of type |
ncl |
An integer specifying the number of clusters to form. |
beta |
Regularisation strength. |
alpha |
Strength of relative entropy term. |
randinit |
Optional. A vector specifying initial cluster assignments. If |
s |
A numeric value or vector specifying the bandwidth parameter(s) for continuous variables. The values must be greater than |
scale |
A logical value indicating whether the continuous variables should be scaled to have unit variance before clustering. Defaults to |
maxiter |
The maximum number of iterations allowed for the clustering algorithm. Defaults to |
nstart |
The number of random initializations to run. The best clustering result (based on the information-theoretic criterion) is returned. Defaults to |
verbose |
Logical. Default to |
The GIBcont
function applies the Generalised Information Bottleneck algorithm to do fuzzy clustering of datasets comprising only continuous variables. This method leverages an information-theoretic objective to optimize the trade-off between data compression and the preservation of relevant information about the underlying data distribution.
Set \alpha = 1
and \alpha = 0
to recover the Information Bottleneck and its Deterministic variant, respectively. If \alpha = 0
, the algorithm ignores
the value of the regularisation parameter \beta
.
The function utilizes the Gaussian kernel \insertCitesilverman_density_1998IBclust for estimating probability densities of continuous features. The kernel is defined as:
K_c\left(\frac{x - x'}{s}\right) = \frac{1}{\sqrt{2\pi}} \exp\left\{-\frac{\left(x - x'\right)^2}{2s^2}\right\}, \quad s > 0.
The bandwidth parameter s
, which controls the smoothness of the density estimate, is automatically determined by the algorithm if not provided by the user.
A list containing the following elements:
Cluster |
A cluster membership matrix. |
Entropy |
A numeric value representing the entropy of the cluster assignment, |
RelEntropy |
A numeric value representing the relative entropy of cluster assignment, given the observation weights |
MutualInfo |
A numeric value representing the mutual information, |
beta |
A numeric value of the regularisation strength beta used. |
alpha |
A numeric value of the strength of relative entropy used. |
s |
A numeric vector of bandwidth parameters used for the continuous variables. |
ht |
A numeric vector tracking the entropy value of the cluster assignments across iterations. |
hy_t |
A numeric vector tracking the relative entropy values between the cluster assignments and observations weights across iterations. |
iyt |
A numeric vector tracking the mutual information values between original labels and cluster assignments across iterations. |
losses |
A numeric vector tracking the final loss values across iterations. |
Efthymios Costa, Ioanna Papatsouma, Angelos Markos
strouse_dib_2017IBclust
\insertRefsilverman_density_1998IBclust
GIBmix
, GIBcat
# Generate simulated continuous data
set.seed(123)
X <- matrix(rnorm(200), ncol = 5) # 200 observations, 5 features
# Run GIBcont with automatic bandwidth selection and multiple initializations
result <- GIBcont(X = X, ncl = 2, beta = 50, alpha = 0.75, s = -1, nstart = 20)
# Print clustering results
print(result$Cluster) # Cluster membership matrix
print(result$Entropy) # Entropy of final clustering
print(result$RelEntropy) # Relative entropy of final clustering
print(result$MutualInfo) # Mutual information between Y and T
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.