Average scattering for clusters - Internal Measure

Share:

Description

Function computes average scattering for clusters.

Usage

1
clv.Scatt(data, clust, dist="euclidean")

Arguments

data

numeric matrix or data.frame where columns correspond to variables and rows to observations

clust

integer vector with information about cluster id the object is assigned to. If vector is not integer type, it will be coerced with warning.

dist

choosen metric: "euclidean" (default value), "manhattan", "correlation"

Details

Let scatter for set X assigned as sigma(X) be defined as vector of variances computed for particular dimensions. Average scattering for clusters is defined as:

Scatt = (1/|C|) * sum{forall i in 1:|C|} ||sigma(Ci)||/||sigma(X)||

where:

|C| - number of clusters,
i - cluster id,
Ci - cluster with id 'i',
X - set with all objects,
||x|| - sqrt(x*x').

Standard deviation is defined as:

stdev = (1/|C|) * sqrt( sum{forall i in 1:|C|} ||sigma(Ci)|| )

Value

As result list with three values is returned.

Scatt - average scattering for clusters value,
stdev - standard deviation value,
cluster.center - numeric matrix where columns correspond to variables and rows to cluster centers.

Author(s)

Lukasz Nieweglowski

References

M. Haldiki, Y. Batistakis, M. Vazirgiannis On Clustering Validation Techniques, http://citeseer.ist.psu.edu/513619.html

See Also

clv.SD and clv.SDbw

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
# load and prepare data
library(clv)
data(iris)
iris.data <- iris[,1:4]

# cluster data
agnes.mod <- agnes(iris.data) # create cluster tree 
v.pred <- as.integer(cutree(agnes.mod,5)) # "cut" the tree 

# compute Scatt index
scatt <- clv.Scatt(iris.data, v.pred)