AIBmix | R Documentation |
The AIBmix
function implements the Agglomerative Information Bottleneck (AIB) algorithm
for hierarchical clustering of datasets containing mixed-type variables, including categorical (nominal and ordinal)
and continuous variables. This method merges clusters so that information retention is maximised at each step to create meaningful clusters,
leveraging bandwidth parameters to handle different categorical data types (nominal and ordinal) effectively \insertCiteslonim_aib_1999IBclust.
AIBmix(X, catcols, contcols, lambda = -1, s = -1, scale = TRUE)
X |
A data frame containing the categorical data to be clustered. All variables should be categorical,
either |
catcols |
A vector indicating the indices of the categorical variables in |
contcols |
A vector indicating the indices of the continuous variables in |
lambda |
A numeric value or vector specifying the bandwidth parameter for categorical variables. The default value is |
s |
A numeric value or vector specifying the bandwidth parameter(s) for continuous variables. The values must be greater than |
scale |
A logical value indicating whether the continuous variables should be scaled to have unit variance before clustering. Defaults to |
The AIBmix
function produces a hierarchical agglomerative clustering of the data while retaining maximal information about the original variable
distributions. The Agglomerative Information Bottleneck algorithm uses an information-theoretic criterion to merge clusters so that information retention is maximised at each step,
hence creating meaningful clusters with maximal information about the original distribution. Bandwidth parameters for categorical
(nominal, ordinal) and continuous variables are adaptively determined if not provided. This process identifies stable and interpretable cluster assignments by maximizing mutual information while
controlling complexity. The method is well-suited for datasets with mixed-type variables and integrates
information from all variable types effectively.
The following kernel functions are used to estimate densities for the clustering procedure:
Continuous variables: Gaussian kernel
K_c\left(\frac{x-x'}{s}\right) = \frac{1}{\sqrt{2\pi}} \exp\left\{ - \frac{\left(x-x'\right)^2}{2s^2} \right\}, \quad s > 0.
Nominal categorical variables: Aitchison & Aitken kernel
K_u\left(x = x' ; \lambda\right) = \begin{cases}
1-\lambda & \text{if } x = x' \\
\frac{\lambda}{\ell-1} & \text{otherwise}
\end{cases}, \quad 0 \leq \lambda \leq \frac{\ell-1}{\ell}.
Ordinal categorical variables: Li & Racine kernel
K_o\left(x = x' ; \nu\right) = \begin{cases}
1 & \text{if } x = x' \\
\nu^{|x - x'|} & \text{otherwise}
\end{cases}, \quad 0 \leq \nu \leq 1.
A list containing the following elements:
merges |
A data frame with 2 columns and |
merge_costs |
A numeric vector tracking the cost incurred by each merge |
partitions |
A list containing |
I_Z_Y |
A numeric vector including the mutual information |
I_X_Y |
A numeric value of the mutual information |
info_ret |
A numeric vector of length |
dendrogram |
A dendrogram visualising the cluster hierarchy. The height is determined by the cost of cluster merges. |
Efthymios Costa, Ioanna Papatsouma, Angelos Markos
slonim_aib_1999IBclust
\insertRefaitchison_kernel_1976IBclust
\insertRefli_nonparametric_2003IBclust
\insertRefsilverman_density_1998IBclust
AIBcat
, AIBcont
# Example dataset with categorical, ordinal, and continuous variables
set.seed(123)
data <- data.frame(
cat_var = factor(sample(letters[1:3], 100, replace = TRUE)), # Nominal categorical variable
ord_var = factor(sample(c("low", "medium", "high"), 100, replace = TRUE),
levels = c("low", "medium", "high"),
ordered = TRUE), # Ordinal variable
cont_var1 = rnorm(100), # Continuous variable 1
cont_var2 = runif(100) # Continuous variable 2
)
# Perform Mixed-Type Hierarchical Clustering with Agglomerative IB
result <- AIBmix(X = data, catcols = 1:2, contcols = 3:4, lambda = -1, s = -1, scale = TRUE)
# Print clustering results
plot(result$dendrogram, xlab = "", sub = "") # Plot dendrogram
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.