Mahalanobis K-means

Share:

Description

K-means variant that uses a class-wise Mahalanobis metric. The implementation follows somewhat Lloyd's, with class-wise covariance computation step following that of centres.

Usage

1
mkmeans(dat, k, maxiter = 100, seeds = NULL)

Arguments

dat

Matrix with n rows and d columns of n d-dimensional data elements to cluster.

k

Number of clusters in the output.

maxiter

Maximum number of iterations.

seeds

Optional indexes of initial centres taken in the input data. If NULL, uniform sampling is used.

Details

K-means is characterized by the use of identity as the metric. To remain close to this in spirit, each class-wise covariance matrix is normalized after computation so that is trace equals d. This avoids excessively unbalanced classes, while facilitating the case where the support of a given cluster is less than 2 - covariance cannot be computed in this case. Covariance then defaults to identity. Also to prevent degeneracies when 2 < cluster size < d, a regularization term proportional to sample data features is added to the covariance diagonal.

The returned value follows the GMM data structure (i.e., as returned by e.g. varbayes() and newGmm())

Value

labels

Cluster labels taking values in 1..k

w

Numeric vector of cluster weights

mean

List of mean vectors

cov

List of covariance matrices

Author(s)

P. Bruneau

See Also

newGmm, varbayes

Examples

1
	mod <- mkmeans(irisdata, 3)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.