Description Usage Arguments Details Value References See Also Examples

Performs k-means clustering `m`

times on a data matrix. `fitted`

returns a vector with the class labels of the best run.

1 2 3 4 |

`X` |
a numeric matrix of data. |

`k` |
the desired number of clusters. |

`m` |
the number of times to run the clustering algorithm. The default is 10. |

`ind` |
a numeric vector of columns indicating the variables used in the clustering. |

`max.iter` |
the maximum number of iterations for a single run of the clustering algorithm. The default is 50. |

`...` |
not used. |

The matrix data given by X is clustered by the standard k-means method, also known as Lloyd-Forgy method (1957 & 1965). This method aims at minimizing the within-cluster sum of squares objective and thus assigns the clusters by the smallest Euclidean distance of observation to the cluster center.

The Random Partition method as described by Hamerly and Elkan (2002) is used for computing the initial cluster means.

kMeans returns an object of class `kMeans`

which has a print, summary, predict, plot and a fitted method. It is a list with the following components:

`Cbest ` |
the vector of the best group labels. |

`ObjBest ` |
the value of the objective function for the best solution. |

`CentroidsBest ` |
the matrix containing the centroids of the best solution. |

`m ` |
the number of repetitions. |

`k ` |
the number of groups. |

`Xname ` |
name of the data set used for the clustering. |

`Ind ` |
the value of input |

`Y ` |
the data used for the clustering. |

`Best ` |
value of which of the runs was the best. |

`Call ` |
a matrix with |

`ObjAll ` |
a vector having the objective functions of all runs. |

`StatusAll ` |
a vector having the status from all runs. |

Forgy, E. W. (1965) Cluster analysis of multivariate data: efficiency vs interpretability of classifications. Biometrics **21**, 768–769.

Hamerly, G.; Elkan, C. (2002) Alternatives to the k-means algorithm that find better clusterings (PDF). Proceedings of the eleventh international conference on Information and knowledge management (CIKM).

Lloyd, S. P. (1957, 1982) Least squares quantization in PCM. Technical Note, Bell Laboratories. Published in 1982 in IEEE Transactions on Information Theory **28**, 128–137.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | ```
# Example using random data from three different populations
# with one variable
set.seed(63555)
exampleData <- matrix(nrow=90, ncol=1)
exampleData[1:30, 1] <- rnorm(30, mean=3, sd=1)
exampleData[31:60, 1] <- rnorm(30, mean=6, sd=1)
exampleData[61:90, 1] <- rnorm(30, mean=9, sd=1)
kMeansResult <- kMeans(exampleData, k=3)
kMeansResult
# K-Means clustering for iris
# Number of runs: 10
# Status of best run: converged
fitted(kMeansResult)
# [1] 2 1 2 2 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 3 1 1 1 1 1
# [47] 1 1 3 1 1 1 1 1 1 1 1 1 3 1 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
summary(kMeansResult)
#K-Means clustering for exampleData
#Clusters to be detected: 3
#Cluster sizes detected: 30 28 32
#Number of runs: 10
#Status of best run: converged
#Criterion value: 3522.284
#Summary of criterion values:
#Min: 3522.284
#Q1: 3546.504
#Mean: 3543.455
#Q3: 3546.504
#Max: 3546.504
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.