wcls_bcls_matrices: Matrix Cluster Scatter Measures
In clv: Cluster Validation Techniques

wcls/bcls.matrix

R Documentation

Matrix Cluster Scatter Measures

Description

Functions compute two base matrix cluster scatter measures.

Usage

wcls.matrix(data,clust,cluster.center)
bcls.matrix(cluster.center,cluster.size,mean)

Arguments

`data`	`numeric matrix` or `data.frame` where columns correspond to variables and rows to observations
`clust`	integer `vector` with information about cluster id the object is assigned to. If vector is not integer type, it will be coerced with warning.
`cluster.center`	`matrix` or `data.frame` where columns correspond to variables and rows to cluster centers defined by `data` and `clust` parameters.
`cluster.size`	integer `vector` with information about size of each cluster computed using `clust` vector.
`mean`	mean of all data objects.

Details

There are two base matrix scatter measures.

1. within-cluster scatter measure defined as:

W = sum(forall k in 1:cluster.num) W(k)

where W(k) = sum(forall x) (x - m(k))*(x - m(k))'

x	- object belongs to cluster k,
m(k)	- center of cluster k.

2. between-cluster scatter measure defined as:

B = sum(forall k in 1:cluster.num) |C(k)|*( m(k) - m )*( m(k) - m )'

\|C(k)\|	- size of cluster k,
m(k)	- center of cluster k,
m	- center of all data objects.

Value

`wcls.matrix`	returns W matrix (within-cluster scatter measure),
`bcls.matrix`	returns B matrix (between-cluster scatter measure).

Author(s)

Lukasz Nieweglowski

References

T. Hastie, R. Tibshirani, G. Walther Estimating the number of data clusters via the Gap statistic, http://citeseer.ist.psu.edu/tibshirani00estimating.html

Examples

# load and prepare data
library(clv)
data(iris)
iris.data <- iris[,1:4]

# cluster data
pam.mod <- pam(iris.data,5) # create five clusters
v.pred <- as.integer(pam.mod$clustering) # get cluster ids associated to given data objects

# compute cluster sizes, center of each cluster 
# and mean from data objects
cls.attr <- cls.attrib(iris.data, v.pred)
center <- cls.attr$cluster.center
size <- cls.attr$cluster.size
iris.mean <- cls.attr$mean

# compute matrix scatter measures
W.matrix <- wcls.matrix(iris.data, v.pred, center)
B.matrix <- bcls.matrix(center, size, iris.mean)
T.matrix <- W.matrix + B.matrix

# example of indices based on W, B i T matrices
mx.scatt.crit1 = sum(diag(W.matrix))
mx.scatt.crit2 = sum(diag(B.matrix))/sum(diag(W.matrix))
mx.scatt.crit3 = det(W.matrix)/det(T.matrix)

clv documentation built on Sept. 28, 2023, 9:06 a.m.