cluster_scatter: Intercluster distances and intracluster diameters - Internal...
In clv: Cluster Validation Techniques

cls.scatt.data

R Documentation

Intercluster distances and intracluster diameters - Internal Measures

Description

Two functions which find most popular intercluster distances and intracluster diameters.

Usage

cls.scatt.data(data, clust, dist="euclidean")
cls.scatt.diss.mx(diss.mx, clust)

Arguments

`data`	`numeric matrix` or `data.frame` where columns correspond to variables and rows to observations
`diss.mx`	square, symmetric `numeric matrix` or `data.frame`, representation of dissimilarity matrix where infomartion about distances between objects is stored.
`clust`	integer `vector` with information about cluster id the object is assigned to. If vector is not integer type, it will be coerced with warning.
`dist`	chosen metric: "euclidean" (default value), "manhattan", "correlation" (variable enable only in `cls.scatt.data` function).

Details

Six intercluster distances and three intracluster diameters can be used to calculate such validity indices as Dunn and Davies-Bouldin like. Let d(x,y) be a distance function between two objects comming from our data set.

Intracluster diameters

The complete diameter represents the distance between two the most remote objects belonging to the same cluster.

diam1(C) = max{ d(x,y): x,y belongs to cluster C }

The average diameter distance defines the average distance between all of the samples belonging to the same cluster.

diam2(C) = 1/|C|(|C|-1) * sum{ forall x,y belongs to cluster C and x != y } d(x,y)

The centroid diameter distance reflects the double average distance between all of the samples and the cluster's center (v(C) - cluster center).

diam3(C) = 1/|C| * sum{ forall x belonging to cluster C} d(x,v(C))

Intercluster distances

The single linkage distance defines the closest distance between two samples belonging to two different clusters.

dist1(Ci,Cj) = min{ d(x,y): x belongs to Ci and y to Cj cluster }

The complete linkage distance represents the distance between the most remote samples belonging to two different clusters.

dist2(Ci,Cj) = max{ d(x,y): x belongs to Ci and y to Cj cluster }

The average linkage distance defines the average distance between all of the samples belonging to two different clusters.

dist3(Ci,Cj) = 1/(|Ci|*|Cj|) * sum{ forall x belongs Ci and y to Cj } d(x,y)

The centroid linkage distance reflects the distance between the centres of two clusters (v(i), v(j) - clusters' centers).

dist4(Ci,Cj) = d(v(i), V(j))

The average of centroids linkage represents the distance between the centre of a cluster and all of samples belonging to a different cluster.

dist5(Ci,Cj) = 1/(|Ci|+|Cj|) * ( sum{ forall x belongs Ci } d(x,v(j)) + sum{ forall y belongs Cj } d(y,v(i)) )

Hausdorff metrics are based on the discovery of a maximal distance from samples of one cluster to the nearest sample of another cluster.

dist6(Ci,Cj) = max{ distH(Ci,Cj), distH(Cj,Ci) }

where: distH(A,B) = max{ min{ d(x,y): y belongs to B}: x belongs to A }

Value

cls.scatt.data returns an object of class "list". Intracluster diameters: intracls.complete, intracls.average, intracls.centroid, are stored in vectors and intercluster distances: intercls.single, intercls.complete, intercls.average, intercls.centroid, intercls.ave_to_cent, intercls.hausdorff in symmetric matrices. Vectors' lengths and both dimensions of each matrix are equal to number of clusters. Additionally in result list cluster.center matrix (rows correspond to clusters centers) and cluster.size vector is given (information about size of each cluster).

cls.scatt.diss.mx returns an object of class "list". Intracluster diameters: intracls.complete, intracls.average, are stored in vectors and intercluster distances: intercls.single, intercls.complete, intercls.average, intercls.hausdorff in symmetric matrices. Vectors' lengths and both dimensions of each matrix are equal to number of clusters. Additionally in result list cluster.size vector is given (information about size of each cluster).

Author(s)

Lukasz Nieweglowski

References

J. Handl, J. Knowles and D. B. Kell Computational cluster validation in post-genomic data analysis, http://bioinformatics.oxfordjournals.org/cgi/reprint/21/15/3201?ijkey=VbTHU29vqzwkGs2&keytype=ref

N. Bolshakova, F. Azuajeb Cluster validation techniques for genome expression data, http://citeseer.ist.psu.edu/552250.html

Examples

# load and prepare data
library(clv)
data(iris)
iris.data <- iris[,1:4]

# cluster data
pam.mod <- pam(iris.data,5) # create five clusters
v.pred <- as.integer(pam.mod$clustering) # get cluster ids associated to given data objects

# compute intercluster distances and intracluster diameters
cls.scatt1 <- cls.scatt.data(iris.data, v.pred)
cls.scatt2 <- cls.scatt.data(iris.data, v.pred, dist="manhattan")
cls.scatt3 <- cls.scatt.data(iris.data, v.pred, dist="correlation")

# the same using dissimilarity matrix
iris.diss.mx <- as.matrix(daisy(iris.data))
cls.scatt4 <- cls.scatt.diss.mx(iris.diss.mx, v.pred)

clv documentation built on Sept. 28, 2023, 9:06 a.m.

clv index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

clv
Cluster Validation Techniques

cluster_scatter: Intercluster distances and intracluster diameters - Internal...
In clv: Cluster Validation Techniques

Intercluster distances and intracluster diameters - Internal Measures

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to cluster_scatter in clv...

R Package Documentation

Browse R Packages

We want your feedback!

clv Cluster Validation Techniques

cluster_scatter: Intercluster distances and intracluster diameters - Internal... In clv: Cluster Validation Techniques

Intercluster distances and intracluster diameters - Internal Measures

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to cluster_scatter in clv...

R Package Documentation

Browse R Packages

We want your feedback!

clv
Cluster Validation Techniques

cluster_scatter: Intercluster distances and intracluster diameters - Internal...
In clv: Cluster Validation Techniques