Description Details Author(s) References
Clustering is a fundamental problem in science and engineering. Many classic methods such as k-means, Gaussian mixture models, and hierarchical clustering, however, employ greedy algorithms which can be entrapped in local minima, sometimes drastical suboptimal ones at that. Recently introduced convex relaxations of k-means and hierarchical clustering shrink cluster centroids toward one another and ensure a unique global minimizer. This package provides two variable splitting methods
Alternating Method of Multipliers (ADMM)
Alternating Minimization Algorithm (AMA)
for solving this convex formulation of the clustering problem. We seek the centroids u_i that minimize
\frac{1}{2} ∑_i || x_i - u_i||_2^2 + γ ∑_l w_{l} ||u_{l1} - u_{l2} ||
Two penalty norms are currently supported: 1-norm and 2-norm.
The two main functions are cvxclust_path_admm
and cvxclust_path_ama
which compute the cluster paths using
the ADMM and AMA methods respectively. The function cvxclust
is a wrapper function that calls either
cvxclust_path_admm
or cvxclust_path_ama
(the default) to perform the computation.
The functions kernel_weights
and knn_weights
can be used in sequence
to compute weights that can improve the quality of the clustering paths.
The typical usage consists of three steps:
Compute weights w
.
Generate a geometrically increasing regularization parameter sequence. Unfortunately a closed form expression for the minimum amount of penalization to get complete coalescence is currently unknown.
Call cvxclust
using the data X
, weights w
, and regularization parameter sequence gamma
.
Cluster assignments can also be retrieved from the solution to the convex clustering problem.
Both cvxclust_path_admm
and cvxclust_path_ama
output an object of class cvxclustobject
.
A cluster assignment can be extracted in two steps:
Call create_adjacency
to construct an adjacency matrix from the centroid differences variable V
.
Call find_clusters
to extract the connected components of the adjacency matrix.
Eric C. Chi, Kenneth Lange
Eric C. Chi and Kenneth Lange. Splitting Methods for Convex Clustering. Journal of Computational and Graphical Statistics, in press. http://arxiv.org/abs/1304.0499.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.