dpmeans: Dirichlet Process K-Means Clustering

Description Usage Arguments References Examples

View source: R/dpmeans.R

Description

This function uses a Bayesian Dirichlet process algorithm presented by Kullis & Jordan (2011) to perform K Means Clustering. Rather than setting a fixed number of clusters as in K-means clustering, the user specifies a concentration parameter τ which controls the precision of a Dirichlet prior on the number of clusters. Higher values of τ lead to a smaller number of clusters, and smaller values lead to a larger number of clusters.

Usage

1
dpmeans(data, tau = 2, prior.labels = NULL, max.iter = 500, tolerance = 1e-06)

Arguments

data

a data frame or matrix of numeric variables

tau

the concentration parameter. set to higher values to get fewer clusters. the default is 2.

prior.labels

a custom vector (character or numeric) or factor with prior cluster labels. this can be manually created, or can be the output of another clustering algorithm. if left as NULL, all observations are initialized in one cluster.

max.iter

number of iterations. Defaults to 500.

tolerance

tolerance for convegence. defaults to 1e-6

References

Kullis, B.; Jordan, M. (2011) Revisiting k-means: New Algorithms via Bayesian Nonparametrics. Proceedings of the 29th International Conference on Machine Learning

Examples

1
dpmeans(iris[,1:4])

abnormally-distributed/cvreg documentation built on May 3, 2020, 3:45 p.m.