ClusterCor: Cluster correlation for cluster evaluation

Description Usage Arguments Details Value Author(s) References Examples

Description

Function to evaluate clustering results by calculating the correlation between an incidence matrix and distance matrix. Suggested by Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V. (2005)

Usage

1
ClusterCor(dist.obj, clusterVector, return_matrices = FALSE)

Arguments

dist.obj

An object of class 'dist' for dataset

clusterVector

A vector with integers indicating which cluster observations belong to

return_matrices

Argument to return incidence- and distance matrix for observations

Details

ClusterCor computes an incidence matrix for observations, given a cluster vector, by creating a n x n matrix where 1 is returned for observations in same cluster and 0 is returned for observations in different clusters. With a distance matrix as input the two matrices are vectorized and correlation is computed. A highly negative correlation indicates that observations in same cluster have small distance to each other, meaning good results with respect to minimizing intra-distance and maximizing inter-distance

Value

incidenceMatrix

Matrix with 0's and 1's indicating if observations belong to same cluster or not

distMatrix

Matrix with distances between observations

correlation

The correlation coefficient. The more negative correlation, the better results are achieved with respect to clustering objectives

Author(s)

Jacob H. Madsen

References

Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V. (2005). Introduction to Data Mining (Second edition). ISBN: 978-03-213-2136-7

Examples

1
2
3
4
5
6
7
8
## Select a dataset to standardize and cluster
X <- scale(iris[,1:4])

## Cluster the dataset with a given number of clusters
cluster.obj <- kmeans(X, 3)

## Evaluate the clustering results with 'ClusterCor'
ClusterCor(dist(X), cluster.obj$cluster)

jhmadsen/ClustTools documentation built on May 24, 2019, 9:54 p.m.