README.md

DP-means clustering in R

This package implements the DP-means algorithm introduced by Kulis and Jordan in their article Revisiting k-means: New Algorithms via Bayesian Nonparametrics. Instead of specifying how many clusters to partition the data into, like one would with k-means, user specifies a penalty parameter λ which controls if/when new clusters are created during iterations:

Effect of choice of lambda on clustering

The algorithm starts with a single cluster and then processes the data points, creating new clusters when needed, and then updates centers until convergence.

Installation

# install.packages("remotes")
remotes::install_github("bearloga/dpmclust")

Usage

dp_means() returns an object with same class and components as kmeans() does, which makes it easy to use other packages that support the kmeans object (e.g. autoplot() in the ggfortify package).

y <- dp_means(x, lambda = 1)
# y$cluster

Future Work

Need to implement lambda means algorithm for choosing optimal λ.



bearloga/dpmclust documentation built on March 7, 2020, 7:11 p.m.