# trimkmeans: Trimmed k-means clustering In trimcluster: Cluster Analysis with Trimming

## Description

The trimmed k-means clustering method by Cuesta-Albertos, Gordaliza and Matran (1997). This optimizes the k-means criterion under trimming a portion of the points.

## Usage

 ```1 2 3 4 5 6 7 8``` ``` trimkmeans(data,k,trim=0.1, scaling=FALSE, runs=100, points=NULL, countmode=runs+1, printcrit=FALSE, maxit=2*nrow(as.matrix(data))) ## S3 method for class 'tkm' print(x, ...) ## S3 method for class 'tkm' plot(x, data, ...) ```

## Arguments

 `data` matrix or data.frame with raw data `k` integer. Number of clusters. `trim` numeric between 0 and 1. Proportion of points to be trimmed. `scaling` logical. If `TRUE`, the variables are centered at their means and scaled to unit variance before execution. `runs` integer. Number of algorithm runs from initial means (randomly chosen from the data points). `points` `NULL` or a matrix with k vectors used as means to initialize the algorithm. If initial mean vectors are specified, `runs` should be 1 (otherwise the same initial means are used for all runs). `countmode` optional positive integer. Every `countmode` algorithm runs `trimkmeans` shows a message. `printcrit` logical. If `TRUE`, all criterion values (mean squares) of the algorithm runs are printed. `maxit` integer. Maximum number of iterations within an algorithm run. Each iteration determines all points which are closer to a different cluster center than the one to which they are currently assigned. The algorithm terminates if no more points have to be reassigned, or if `maxit` is reached. `x` object of class `tkm`. `...` further arguments to be transferred to `plot` or `plotcluster`.

## Details

`plot.tkm` calls `plotcluster` if the dimensionality of the data `p` is 1, shows a scatterplot with non-trimmed regions if `p=2` and discriminant coordinates computed from the clusters (ignoring the trimmed points) if `p>2`.

## Value

An object of class 'tkm' which is a LIST with components

 `classification` integer vector coding cluster membership with trimmed observations coded as `k+1`. `means` numerical matrix giving the mean vectors of the k classes. `disttom` vector of squared Euclidean distances of all points to the closest mean. `ropt` maximum value of `disttom` so that the corresponding point is not trimmed. `k` see above. `trim` see above. `runs` see above. `scaling` see above.

## Author(s)

Christian Hennig chrish@stats.ucl.ac.uk http://www.homepages.ucl.ac.uk/~ucakche/

## References

Cuesta-Albertos, J. A., Gordaliza, A., and Matran, C. (1997) Trimmed k-Means: An Attempt to Robustify Quantizers, Annals of Statistics, 25, 553-576.

`plotcluster`

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29``` ``` set.seed(10001) n1 <-60 n2 <-60 n3 <-70 n0 <-10 nn <- n1+n2+n3+n0 pp <- 2 X <- matrix(rep(0,nn*pp),nrow=nn) ii <-0 for (i in 1:n1){ ii <-ii+1 X[ii,] <- c(5,-5)+rnorm(2) } for (i in 1:n2){ ii <- ii+1 X[ii,] <- c(5,5)+rnorm(2)*0.75 } for (i in 1:n3){ ii <- ii+1 X[ii,] <- c(-5,-5)+rnorm(2)*0.75 } for (i in 1:n0){ ii <- ii+1 X[ii,] <- rnorm(2)*8 } tkm1 <- trimkmeans(X,k=3,trim=0.1,runs=3) # runs=3 is used to save computing time. print(tkm1) plot(tkm1,X) ```

### Example output

```* trimmed k-means *
trim= 0.1 , k= 3
Classification (trimmed points are indicated by  4 ):
[1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 3 3 3 3 3 3 3 4 3 3 3 3 3 3 3 3 3 3
[38] 3 3 3 4 3 3 3 3 3 3 3 3 4 3 3 3 3 3 3 3 3 3 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[75] 4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 4 2 2 2 2 2 2 2 2 2 2
[112] 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[149] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[186] 1 1 1 1 1 4 4 4 4 4 4 4 4 4 4
Means:
[,1]      [,2]
[1,] -4.948673 -4.978344
[2,]  5.125058  5.040808
[3,]  5.139695 -5.083246
Trimmed mean squares:  1.214552
```

trimcluster documentation built on Feb. 9, 2020, 5:06 p.m.