Matrix Cluster Scatter Measures

Description

Functions compute two base matrix cluster scatter measures.

Usage

1
2
wcls.matrix(data,clust,cluster.center)
bcls.matrix(cluster.center,cluster.size,mean)

Arguments

data

numeric matrix or data.frame where columns correspond to variables and rows to observations

clust

integer vector with information about cluster id the object is assigned to. If vector is not integer type, it will be coerced with warning.

cluster.center

matrix or data.frame where columns correspond to variables and rows to cluster centers defined by data and clust parameters.

cluster.size

integer vector with information about size of each cluster computed using clust vector.

mean

mean of all data objects.

Details

There are two base matrix scatter measures.

1. within-cluster scatter measure defined as:

W = sum(forall k in 1:cluster.num) W(k)

where W(k) = sum(forall x) (x - m(k))*(x - m(k))'

x - object belongs to cluster k,
m(k) - center of cluster k.

2. between-cluster scatter measure defined as:

B = sum(forall k in 1:cluster.num) |C(k)|*( m(k) - m )*( m(k) - m )'

|C(k)| - size of cluster k,
m(k) - center of cluster k,
m - center of all data objects.

Value

wcls.matrix returns W matrix (within-cluster scatter measure),
bcls.matrix returns B matrix (between-cluster scatter measure).

Author(s)

Lukasz Nieweglowski

References

T. Hastie, R. Tibshirani, G. Walther Estimating the number of data clusters via the Gap statistic, http://citeseer.ist.psu.edu/tibshirani00estimating.html

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# load and prepare data
library(clv)
data(iris)
iris.data <- iris[,1:4]

# cluster data
pam.mod <- pam(iris.data,5) # create five clusters
v.pred <- as.integer(pam.mod$clustering) # get cluster ids associated to given data objects

# compute cluster sizes, center of each cluster 
# and mean from data objects
cls.attr <- cls.attrib(iris.data, v.pred)
center <- cls.attr$cluster.center
size <- cls.attr$cluster.size
iris.mean <- cls.attr$mean

# compute matrix scatter measures
W.matrix <- wcls.matrix(iris.data, v.pred, center)
B.matrix <- bcls.matrix(center, size, iris.mean)
T.matrix <- W.matrix + B.matrix

# example of indices based on W, B i T matrices
mx.scatt.crit1 = sum(diag(W.matrix))
mx.scatt.crit2 = sum(diag(B.matrix))/sum(diag(W.matrix))
mx.scatt.crit3 = det(W.matrix)/det(T.matrix)