ClusterShannonInfo: Shannon Information

View source: R/ClusterShannonInfo.R

ClusterShannonInfoR Documentation

Shannon Information

Description

Shannon Information [Shannon, 1948] for each column in ClsMatrix.

Usage

ClusterShannonInfo(ClsMatrix)

Arguments

ClsMatrix

[1:n,1:C] matrix of C clusterings each columns is defined as:

1:n numerical vector of numbers defining the classification as the main output of the clustering algorithm for the n cases of data. It has k unique numbers representing the arbitrary labels of the clustering.

Details

Info[1:d] = sum(-p * log(p)/MaxInfo) for all unique cases with probability p in ClsMatrix[,c] for a column with k clusters MaxInfo = -(1/k)*log(1/k)

The function measures how balanced and diverse the clusters are in each clustering solution (column):

Balanced clusters (e.g., 3 clusters with equal size) => entropy close to 1.

Skewed clusters (e.g., 95 percent in one cluster) => entropy closer to 0.

It normalizes entropy by the theoretical max entropy for k clusters, giving a percentage of "information efficiency" in the clustering.

Value

Info

[1:max.nc,1:C] matrix of Shannin informaton as defined in details, each column represents one Cls of ClsMatrix,each row yields the information of one cluster up the ClusterNo k, if k<max.nc (highest number of clusters) then NaN are filled.

ClusterNo

Number of Clusters k found for each Cls respectively

MaxInfo

max per column of Info

MinInfo

min per column of Info

MedianInfo

median per column of Info

MeanInfo

mean per column of Info

Author(s)

Michael Thrun

References

[Shannon, 1948] Shannon, C. E.: A Mathematical Theory of Communication, Bell System Technical Journal, Vol. 27(3), pp. 379-423. doi doi:10.1002/j.1538-7305.1948.tb01338.x, 1948.

Examples

# Reading the iris dataset from the standard R-Package datasets
data <- as.matrix(iris[,1:4])
max.nc = 7
# Creating the clusterings for the data set
#(here with method complete) for the number of classes 2 to 8
hc <- hclust(dist(data), method = "complete")
clsm <- matrix(data = 0, nrow = dim(data)[1],

ncol = max.nc)
for (i in 2:(max.nc+1)) {
  clsm[,i-1] <- cutree(hc,i)
}

ClusterShannonInfo(clsm)

FCPS documentation built on Nov. 5, 2025, 7:44 p.m.