kmeanspp: K-Means++ Clustering Algorithm

Description Usage Arguments Value References Examples

View source: R/kmeanspp.R

Description

k-means++ algorithm is known to be a smart, careful initialization technique. It is originally intended to return a set of k points as initial centers though it can still be used as a rough clustering algorithm by assigning points to the nearest points.

Usage

1
kmeanspp(data, k = 2)

Arguments

data

an (n\times p) matrix whose rows are observations.

k

the number of clusters.

Value

a length-n vector of class labels.

References

\insertRef

arthur_kmeans_2007maotai

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## use simple example of iris dataset
data(iris) 
mydata = as.matrix(iris[,1:4])
mycol  = as.factor(iris[,5])

## find the low-dimensional embedding for visualization
my2d = cmds(mydata, ndim=2)$embed

## apply 'kmeanspp' with different numbers of k's.
k2 = kmeanspp(mydata, k=2)
k3 = kmeanspp(mydata, k=3)
k4 = kmeanspp(mydata, k=4)
k5 = kmeanspp(mydata, k=5)
k6 = kmeanspp(mydata, k=6)

## visualize
opar <- par(no.readonly=TRUE)
par(mfrow=c(2,3))
plot(my2d, col=k2, main="k=2", pch=19, cex=0.5)
plot(my2d, col=k3, main="k=3", pch=19, cex=0.5)
plot(my2d, col=k4, main="k=4", pch=19, cex=0.5)
plot(my2d, col=k5, main="k=5", pch=19, cex=0.5)
plot(my2d, col=k6, main="k=6", pch=19, cex=0.5)
plot(my2d, col=mycol, main="true cluster", pch=19, cex=0.5)
par(opar)

maotai documentation built on Oct. 25, 2021, 9:06 a.m.