SSE: Sum of squared error (SSE) for cluster evaluation

Description Usage Arguments Details Value Author(s) References Examples

Description

Function to evaluate clustering results with sum of squared error (SSE) by calculating the distance from cluster members to cluster centroids

Usage

1
SSE(dataset, clusterVector)

Arguments

dataset

The dataset for which a sum of squared error and cluster centroids are returned

clusterVector

A vector of with integers indicating which cluster observations belong to

Details

SSE computes the sum of squared error for clustering results, given a cluster vector. The smaller the squared error, the greater clustering results are achieved with respect to intra-cluster distance. SSE also return a matrix with cluster centroids.

Value

centroidMatrix

A matrix of n-clusters x n-dimensions with cluster centroids

clusterWithin

A vector of n-clusters length with sum of squared error

sumWithin

Sum of squared error for all clusters, i.e. sum(clusterWithin)

Author(s)

Jacob H. Madsen

References

Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V. (2005). Introduction to Data Mining (Second edition). ISBN: 978-03-213-2136-7

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Select a dataset to standardize and cluster
X <- scale(iris[,1:4])

## Cluster the dataset with a given number of clusters
cluster.obj <- hclust(dist(X), method='complete')

## Cut the hierarchical clustering tree
chosen.clusters <- cutree(cluster.obj, 3)

## Evaluate the clustering results with 'SSE' and 'SST'
clusters.SSE <- SSE(X, chosen.clusters)$sumWithin
clusters.SST <- SST(X)

## Calculate the r-squared for your cluster solution
rsq <- 1-(clusters.SSE/clusters.SST)

print(rsq)

jhmadsen/ClustTools documentation built on May 24, 2019, 9:54 p.m.