weightedKmeans: Weighted K-means Algorithm

Description Usage Arguments Details Value Author(s) References Examples

View source: R/weightedKmeans.R

Description

This function computes a weighted version of k-means algorithm: using feature weighting as proposed in Amorim and Mirkin (2012) or using object weighting, or both.

Usage

1
weightedKmeans(dat, k=2, nbRep=100, ifc=FALSE, ioc=TRUE, fwm="DISP", owm="SIL")

Arguments

dat

Numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns).

k

The clustering is processed for k partitions.

nbRep

The number of random starts.

ifc

Specify if the algorithm needs to use an internal computation of feature weights.

ioc

Specify if the algorithm needs to use an internal computation of object weights.

fwm

The features weighted calculation method selected with the following keys : DISP, based on a dispersion measure (default).

owm

The objects weighted calculation method selected with the following keys : SIL, based on the silhouette index (default); SIL_NK, based on the silhouette index where the sum of objects weights in a cluster is equal to the number of objects in the cluster; MED, based on the median distance between the object and its partition centroid; MED_NK, based on the median distance between the object and its partition centroid where the sum of objects weights in a cluster is equal to the number of objects in the cluster; MIN_CEN_DIST, based on the minimum euclidean distance between the object and the nearest centroid (different of its own); MIN_CEN_DIST_NK, based on the minimum euclidean distance between the object and the nearest centroid (different of its own) where the sum of objects weights in a cluster is equal to the number of objects in the cluster; SUM_DIST_CEN, based on the sum of euclidean distances between the objects and the others centroids.

Details

This proposed weighted version of k-means algorithm computes a weighted version of k-means algorithm: using feature weighting or using object weighting or both. The weighted feature method is described in the paper of Amorim and Mirkin (2012) (see references for further informations). The k-means algorithm used with the object weighting is inspired by the well-known Hartigan's method (Hartigan and Wong, 1979) where the objects are moved or not from one cluster to another according to the optimization of the overall cost function, unlike the MacQueen algorithm which assign greedily the points to the nearest centroid according to the Euclidean distance. The quality of the clustering produced by the MacQueen k-means algorithm is evaluated by the well-known Calinski-Harabasz cluster validity index (Caliński and Harabasz, 1974).

The MacQueen k-means algorithm (MacQueen, 1967) aims to separate n objects in k non-overlapping groups as to minimize the sum of squared errors (i.e. the sum of distances between the points and the center of their group). First, this variant of k-means proceeds to a step of initialization choosing k data points as centroids (centers of partitions), assigning the points to the nearest centroid according to the Euclidean distance and updating the centroids using the mean of the points in the group. Then, the algorithm iteratively until convergence proceeds to a step assignation where each point is assigned to the nearest centroid according to the Euclidean distance and the concerned centroid is updated consequently using the mean of the points in the group. The convergence is reached either when the centroids stop moving or when the number of internal iterations is attained. The quality of the clustering produced by the MacQueen k-means algorithm is evaluated by the well-known Calinski-Harabasz cluster validity index (Caliński and Harabasz, 1974).

Value

k

The clustering is processed for k partitions.

bestCH

The best value of the Calinski-Harabasz cluster validity index produced by the k-means algorithm.

clusteringCH

The clustering produced by the k-means algorithm for the best Calinski-Harabasz cluster validity index.

objectWeightCH

The object weights produced by the k-means algorithm for the best Calinski-Harabasz cluster validity index.

bestSil

The best value of the Silhouette cluster validity index produced by the k-means algorithm.

clusteringSil

The clustering produced by the k-means algorithm for the best Silhouette cluster validity index.

objectWeightSil

The object weights produced by the k-means algorithm for the best Silhouette cluster validity index.

Algorithm

The algorithm used to produce the clustering.

Author(s)

Alexandre Gondeau

References

Caliński, T., and Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics-theory and Methods, 3, 1-27.

De Amorim, R. C., and Mirkin, B. (2012). Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering. Pattern Recognition, 45, 1061-1075.

Hartigan, J. A., and Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28, 100-108.

Examples

1
2
3
4
data("iris")

# Object weighting k-means algorithm
cl <- weightedKmeans(as.matrix(iris[,1:4]), 3, 50, FALSE, TRUE, "SIL")

RWeightedKmeans documentation built on Oct. 26, 2018, 5:04 p.m.