IBcont | R Documentation |
The IBcont
function implements the Information Bottleneck (IB) algorithm
for fuzzy clustering of continuous data. This method optimizes an information-theoretic objective to
preserve relevant information while forming concise and interpretable cluster representations
\insertCitestrouse_ib_2019IBclust.
IBcont(X, ncl, beta, randinit = NULL, s = -1, scale = TRUE,
maxiter = 100, nstart = 100, verbose = FALSE)
X |
A numeric matrix or data frame containing the continuous data to be clustered. All variables should be of type |
ncl |
An integer specifying the number of clusters to form. |
beta |
Regularisation strength. |
randinit |
Optional. A vector specifying initial cluster assignments. If |
s |
A numeric value or vector specifying the bandwidth parameter(s) for continuous variables. The values must be greater than |
scale |
A logical value indicating whether the continuous variables should be scaled to have unit variance before clustering. Defaults to |
maxiter |
The maximum number of iterations allowed for the clustering algorithm. Defaults to |
nstart |
The number of random initializations to run. The best clustering result (based on the information-theoretic criterion) is returned. Defaults to |
verbose |
Logical. Default to |
The IBcont
function applies the Information Bottleneck algorithm to do fuzzy clustering of datasets comprising only continuous variables. This method leverages an information-theoretic objective to optimize the trade-off between data compression and the preservation of relevant information about the underlying data distribution.
The function utilizes the Gaussian kernel \insertCitesilverman_density_1998IBclust for estimating probability densities of continuous features. The kernel is defined as:
K_c\left(\frac{x - x'}{s}\right) = \frac{1}{\sqrt{2\pi}} \exp\left\{-\frac{\left(x - x'\right)^2}{2s^2}\right\}, \quad s > 0.
The bandwidth parameter s
, which controls the smoothness of the density estimate, is automatically determined by the algorithm if not provided by the user.
A list containing the following elements:
Cluster |
A cluster membership matrix. |
InfoXT |
A numeric value representing the mutual information, |
InfoYT |
A numeric value representing the mutual information, |
beta |
A numeric value of the regularisation strength beta used. |
s |
A numeric vector of bandwidth parameters used for the continuous variables. |
ixt |
A numeric vector tracking the mutual information values between original observation weights and cluster assignments across iterations. |
iyt |
A numeric vector tracking the mutual information values between original labels and cluster assignments across iterations. |
losses |
A numeric vector tracking the final loss values across iterations. |
Efthymios Costa, Ioanna Papatsouma, Angelos Markos
strouse_ib_2019IBclust
\insertRefsilverman_density_1998IBclust
IBmix
, IBcat
# Generate simulated continuous data
set.seed(123)
X <- matrix(rnorm(200), ncol = 5) # 200 observations, 5 features
# Run IBcont with automatic bandwidth selection and multiple initializations
result <- IBcont(X = X, ncl = 3, beta = 50, s = -1, nstart = 20)
# Print clustering results
print(result$Cluster) # Cluster membership matrix
print(result$InfoXT) # Mutual information between X and T
print(result$InfoYT) # Mutual information between Y and T
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.