kMeans: K-Means Clustering

Description Usage Arguments Details Value References See Also Examples

View source: R/kMeans.R

Description

Performs k-means clustering m times on a data matrix. fitted returns a vector with the class labels of the best run.

Usage

1
2
3
4
kMeans(X, k, m=10, ind, max.iter=50, ...)

## S3 method for class 'kMeans'
fitted(object)

Arguments

X

a numeric matrix of data.

k

the desired number of clusters.

m

the number of times to run the clustering algorithm. The default is 10.

ind

a numeric vector of columns indicating the variables used in the clustering.

max.iter

the maximum number of iterations for a single run of the clustering algorithm. The default is 50.

...

not used.

Details

The matrix data given by X is clustered by the standard k-means method, also known as Lloyd-Forgy method (1957 & 1965). This method aims at minimizing the within-cluster sum of squares objective and thus assigns the clusters by the smallest Euclidean distance of observation to the cluster center.

The Random Partition method as described by Hamerly and Elkan (2002) is used for computing the initial cluster means.

Value

kMeans returns an object of class kMeans which has a print, summary, predict, plot and a fitted method. It is a list with the following components:

Cbest

the vector of the best group labels.

ObjBest

the value of the objective function for the best solution.

CentroidsBest

the matrix containing the centroids of the best solution.

m

the number of repetitions.

k

the number of groups.

Xname

name of the data set used for the clustering.

Ind

the value of input ind.

Y

the data used for the clustering.

Best

value of which of the runs was the best.

Call

a matrix with n rows and m columns giving the cluster assignments for all m runs.

ObjAll

a vector having the objective functions of all runs.

StatusAll

a vector having the status from all runs.

References

Forgy, E. W. (1965) Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics 21, 768–769.

Hamerly, G.; Elkan, C. (2002) Alternatives to the k-means algorithm that find better clusterings (PDF). Proceedings of the eleventh international conference on Information and knowledge management (CIKM).

Lloyd, S. P. (1957, 1982) Least squares quantization in PCM. Technical Note, Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory 28, 128–137.

See Also

kmeans

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Example using random data from three different populations
# with one variable

set.seed(63555)
exampleData <- matrix(nrow=90, ncol=1)
exampleData[1:30, 1] <- rnorm(30, mean=3, sd=1)
exampleData[31:60, 1] <- rnorm(30, mean=6, sd=1)
exampleData[61:90, 1] <- rnorm(30, mean=9, sd=1)
kMeansResult <- kMeans(exampleData, k=3)

kMeansResult
# K-Means clustering for iris 
# Number of runs: 10 
# Status of best run: converged

fitted(kMeansResult)
# [1] 2 1 2 2 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1
# [47] 1 1 3 1 1 1 1 1 1 1 1 1 3 1 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

summary(kMeansResult)
#K-Means clustering for exampleData 

#Clusters to be detected: 3 
#Cluster sizes detected: 30 28 32 

#Number of runs: 10 
#Status of best run: converged 

#Criterion value: 3522.284 
#Summary of criterion values:
#Min: 3522.284 
#Q1: 3546.504 
#Mean: 3543.455 
#Q3: 3546.504 
#Max: 3546.504

rintakumpu/custom-kmeans documentation built on May 3, 2019, 10:45 p.m.