epmeans: EP-means Algorithm for Clustering Empirical Distributions

Description Usage Arguments Value References Examples

View source: R/epmeans.R

Description

EP-means is a variant of k-means algorithm adapted to cluster multiple empirical cumulative distribution functions under metric structure induced by Earth Mover's Distance.

Usage

1
epmeans(elist, k = 2)

Arguments

elist

a length N list of either vector or ecdf objects.

k

the number of clusters.

Value

a named list containing

cluster

an integer vector indicating the cluster to which each ecdf is allocated.

centers

a length k list of centroid ecdf objects.

References

\insertRef

henderson_epmeans_2015maotai

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
## two sets of 1d samples, 10 each and add some noise
#    set 1 : mixture of two gaussians
#    set 2 : single gamma distribution

# generate data
elist = list()
for (i in 1:10){
   elist[[i]] = stats::ecdf(c(rnorm(100, mean=-2), rnorm(50, mean=2)))
}
for (j in 11:20){
   elist[[j]] = stats::ecdf(rgamma(100,1) + rnorm(100, sd=sqrt(0.5)))
}

# run EP-means with k clusters 
# change the value below to see different settings
myk   = 2
epout = epmeans(elist, k=myk)

# visualize
opar = par(no.readonly=TRUE)
par(mfrow=c(1,myk))
for (k in 1:myk){
  idk = which(epout$cluster==k)
  for (i in 1:length(idk)){
    if (i<2){
      pm = paste("class ",k," (size=",length(idk),")",sep="")
      plot(elist[[idk[i]]], verticals=TRUE, lwd=0.25, do.points=FALSE, main=pm)
    } else {
      plot(elist[[idk[i]]], add=TRUE, verticals=TRUE, lwd=0.25, do.points=FALSE)
    }
    plot(epout$centers[[k]], add=TRUE, verticals=TRUE, lwd=2, col="red", do.points=FALSE)
  }
}
par(opar)

maotai documentation built on Oct. 25, 2021, 9:06 a.m.