kmeans: Standard MacQueen K-means Algorithm

Description Usage Arguments Details Value Author(s) References Examples

View source: R/weightedKmeans.R

Description

This function computes the standard MacQueen version of k-means algorithm.

Usage

1
kmeans(dat, k=2, nbRep=100)

Arguments

dat

Numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).

k

The clustering is processed for k partitions.

nbRep

The number of random starts.

Details

The MacQueen k-means algorithm (MacQueen, 1967) aims to separate n objects in k non-overlapping groups as to minimize the sum of squared errors (i.e. the sum of distances between the points and the center of their group). First, this variant of k-means proceeds to a step of initialization choosing k data points as centroids (centers of partitions), assigning the points to the nearest centroid according to the Euclidean distance and updating the centroids using the mean of the points in the group. Then, the algorithm iteratively until convergence proceeds to a step assignation where each point is assigned to the nearest centroid according to the Euclidean distance and the concerned centroid is updated consequently using the mean of the points in the group. The convergence is reached either when the centroids stop moving or when the number of internal iterations is attained. The quality of the clustering produced by the MacQueen k-means algorithm is evaluated by the well-known Calinski-Harabasz cluster validity index (Caliński and Harabasz, 1974).

Value

k

The clustering is processed for k partitions.

bestCH

The best value of the Calinski-Harabasz cluster validity index produced by the k-means algorithm.

clusteringCH

The clustering produced by the k-means algorithm for the best Calinski-Harabasz cluster validity index.

bestSil

The best value of the Silhouette cluster validity index produced by the k-means algorithm.

clusteringSil

The clustering produced by the k-means algorithm for the best Silhouette cluster validity index.

Algorithm

The algorithm used to produce the clustering.

Author(s)

Alexandre Gondeau

References

Caliński, T., and Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3, 1-27.

MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, pp. 281-297.

Examples

1
2
3
4
data("iris")

# Classical k-means algorithm
cl <- kmeans(as.matrix(iris[,1:4]), 3, 100)

RWeightedKmeans documentation built on Oct. 26, 2018, 5:04 p.m.