# Trimmed Lloyd k-means for 3D shapes

### Description

The basic foundation of k-means is that the sample mean is the value that minimizes the Euclidean distance from each point, to the centroid of the cluster to which it belongs. Two fundamental concepts of the statistical shape analysis are the Procrustes mean and the Procrustes distance. Therefore, by integrating the Procrustes mean and the Procrustes distance we can use k-means in the shape analysis context.

The k-means method has been proposed by several scientists in different forms. In computer science and pattern recognition the k-means algorithm is often termed the Lloyd algorithm (see Lloyd (1982)).

This function is proposed to incorporate a modification to `LloydShapes`

in order to make the k-means algorithm robust. Robustness is a property very desirable in a lot of applications. As it is well known, the results of the k-means algorithm can be influenced by outliers and extreme data, or bridging points between clusters. Garcia-Escudero et al. (1999) propose a way of making k-means more robust, which combines the k-means idea with an impartial trimming procedure: a proportion alpha (between 0 and 1) of observations are trimmed (the trimmed observations are self-determined by the data). See also `trimmedoid`

.

Note that in the generic name of the k-means algorithm, k refers to the number of clusters to search for. To be more specific in the R code, k is referred to as `numClust`

, see next section *arguments*.

### Usage

1 2 | ```
trimmedLloydShapes(array3D,n,alpha,numClust,algSteps=10,niter=10,
stopCr=0.0001,verbose)
``` |

### Arguments

`array3D` |
Array with the 3D landmarks of the sample objects. Each row corresponds to an observation, and each column corresponds to a dimension (x,y,z). |

`n` |
Number of individuals. |

`alpha` |
Proportion of trimmed sample. |

`numClust` |
Number of clusters. |

`algSteps` |
Number of steps per initialization. Default value is 10. |

`niter` |
Number of random initializations (iterations). Default value is 10. |

`stopCr` |
Relative stopping criteria. Default value is 0.0001. |

`verbose` |
A logical specifying whether to provide descriptive output about the running process. |

### Value

A list with the following elements:

*asig*: Optimal clustering.

*cases*: Anthropometric cases (optimal centers).

*vopt*: Optimal objective function.

*trimmWomen*: List to save the trimmed individual of each iteration.

*trimmsIter*: Vector with the number of iterations where the optimum was reached. The last number different from NA refers to the last iteration where the final optimum was reached.

*bestNstep*: Nstep of the iteration where the optimum has reached.

*initials*: Random initial values used in each iteration. These values can be used by `HartiganShapes`

.

*discarded*: Discarded (trimmed) observations.

### Note

We note that adding a trimmed procedure to the Lloyd algorithm is very direct and easy, while for the Hartigan-Wong algorithm, more modifications of the algorithm are needed, which makes the implementation of its trimmed version difficult.

### Author(s)

Amelia Simo

### References

Vinue, G., Simo, A., and Alemany, S., (2014). The k-means algorithm for 3D shapes with an application to apparel design, *Advances in Data Analysis and Classification*, 1–30.

Lloyd, S. P., (1982). Least Squares Quantization in PCM, *IEEE Transactions on Information Theory* **28**, 129–137.

Dryden, I. L., and Mardia, K. V., (1998). *Statistical Shape Analysis*, Wiley, Chichester.

Garcia-Escudero, L. A., Gordaliza, A., and Matran, C., (2003). Trimming tools in exploratory data analysis, *Journal of Computational and Graphical Statistics* **12(2)**, 434–449.

Garcia-Escudero, L. A., and Gordaliza, A., (1999). Robustness properties of k-means and trimmed k-means, *Journal of the American Statistical Association* **94(447)**, 956–969.

### See Also

`LloydShapes`

, `trimmedoid`

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | ```
#CLUSTERING INDIVIDUALS ACCORDING TO THEIR SHAPE:
landmarksNoNa <- na.exclude(landmarksSampleSpaSurv)
dim(landmarksNoNa)
#[1] 574 198
numLandmarks <- (dim(landmarksNoNa)[2]) / 3
#[1] 66
#As a toy example, only the first 10 individuals are used.
landmarksNoNa_First10 <- landmarksNoNa[1:10, ]
(numIndiv <- dim(landmarksNoNa_First10)[1])
#[1] 10
array3D <- array3Dlandm(numLandmarks, numIndiv, landmarksNoNa_First10)
numClust <- 2 ; alpha <- 0.01 ; algSteps <- 1 ; niter <- 1 ; stopCr <- 0.0001
set.seed(2013)
res <- trimmedLloydShapes(array3D, numIndiv, alpha, numClust,
algSteps, niter, stopCr, FALSE)
#Optimal partition and prototypes:
clust <- res$asig
table(clust)
prototypes <- anthrCases(res)
#Trimmed individuals:
trimmed <- trimmOutl(res)
``` |