Trimmed kmeans clustering
Description
The trimmed kmeans clustering method by CuestaAlbertos, Gordaliza and Matran (1997). This optimizes the kmeans criterion under trimming a portion of the points.
Usage
1 2 3 4 5 6 7 8 
Arguments
data 
matrix or data.frame with raw data 
k 
integer. Number of clusters. 
trim 
numeric between 0 and 1. Proportion of points to be trimmed. 
scaling 
logical. If 
runs 
integer. Number of algorithm runs from initial means (randomly chosen from the data points). 
points 

countmode 
optional positive integer. Every 
printcrit 
logical. If 
maxit 
integer. Maximum number of iterations within an algorithm
run. Each iteration determines all points which
are closer to a different cluster center than the one to which they are
currently assigned. The algorithm terminates if no more points have
to be reassigned, or if 
x 
object of class 
... 
further arguments to be transferred to 
Details
plot.tkm
calls plotcluster
if the
dimensionality of the data p
is 1, shows a scatterplot
with nontrimmed regions if p=2
and discriminant coordinates
computed from the clusters (ignoring the trimmed points) if p>2
.
Value
An object of class 'tkm' which is a LIST with components
classification 
integer vector coding cluster membership with trimmed
observations coded as 
means 
numerical matrix giving the mean vectors of the k classes. 
disttom 
vector of squared Euclidean distances of all points to the closest mean. 
ropt 
maximum value of 
k 
see above. 
trim 
see above. 
runs 
see above. 
scaling 
see above. 
Author(s)
Christian Hennig chrish@stats.ucl.ac.uk http://www.homepages.ucl.ac.uk/~ucakche/
References
CuestaAlbertos, J. A., Gordaliza, A., and Matran, C. (1997) Trimmed kMeans: An Attempt to Robustify Quantizers, Annals of Statistics, 25, 553576.
See Also
plotcluster
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29  set.seed(10001)
n1 <60
n2 <60
n3 <70
n0 <10
nn < n1+n2+n3+n0
pp < 2
X < matrix(rep(0,nn*pp),nrow=nn)
ii <0
for (i in 1:n1){
ii <ii+1
X[ii,] < c(5,5)+rnorm(2)
}
for (i in 1:n2){
ii < ii+1
X[ii,] < c(5,5)+rnorm(2)*0.75
}
for (i in 1:n3){
ii < ii+1
X[ii,] < c(5,5)+rnorm(2)*0.75
}
for (i in 1:n0){
ii < ii+1
X[ii,] < rnorm(2)*8
}
tkm1 < trimkmeans(X,k=3,trim=0.1,runs=3)
# runs=3 is used to save computing time.
print(tkm1)
plot(tkm1,X)
