dpmeans: DP-means Algorithm for Clustering Euclidean Data In maotai: Tools for Matrix Algebra, Optimization and Inference

Description

DP-means is a nonparametric clustering method motivated by DP mixture model in that the number of clusters is determined by a parameter λ. The larger the λ value is, the smaller the number of clusters is attained. In addition to the original paper, we added an option to randomly permute an order of updating for each observation's membership as a common heuristic in the literature of cluster analysis.

Usage

 1 2 3 4 5 6 7 dpmeans( data, lambda = 1, maxiter = 1234, abstol = 1e-06, permute.order = FALSE ) 

Arguments

 data an (n\times p) data matrix for each row being an observation. lambda a threshold to define a new cluster. maxiter maximum number of iterations. abstol stopping criterion permute.order a logical; TRUE if random order for permutation is used, FALSE otherwise.

Value

a named list containing

cluster

an (n\times ndim) matrix whose rows are embedded observations.

centers

a list containing information for out-of-sample prediction.

References

\insertRef

kulis_revisiting_2012maotai

Examples

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 ## define data matrix of two clusters x1 = matrix(rnorm(50*3,mean= 2), ncol=3) x2 = matrix(rnorm(50*3,mean=-2), ncol=3) X = rbind(x1,x2) lab = c(rep(1,50),rep(2,50)) ## run dpmeans with several lambda values solA <- dpmeans(X, lambda= 5)$cluster solB <- dpmeans(X, lambda=10)$cluster solC <- dpmeans(X, lambda=20)$cluster ## visualize the results opar <- par(no.readonly=TRUE) par(mfrow=c(1,4), pty="s") plot(X,col=lab, pch=19, cex=.8, main="True", xlab="x", ylab="y") plot(X,col=solA, pch=19, cex=.8, main="dpmeans lbd=5", xlab="x", ylab="y") plot(X,col=solB, pch=19, cex=.8, main="dpmeans lbd=10", xlab="x", ylab="y") plot(X,col=solC, pch=19, cex=.8, main="dpmeans lbd=20", xlab="x", ylab="y") par(opar) ## let's find variations by permuting orders of update ## used setting : lambda=20, we will 8 runs sol8 <- list() for (i in 1:8){ sol8[[i]] = dpmeans(X, lambda=20, permute.order=TRUE)$cluster } ## let's visualize vpar <- par(no.readonly=TRUE) par(mfrow=c(2,4), pty="s") for (i in 1:8){ pm = paste("permute no.",i,sep="") plot(X,col=sol8[[i]], pch=19, cex=.8, main=pm, xlab="x", ylab="y") } par(vpar) 

maotai documentation built on Feb. 3, 2022, 5:09 p.m.