# epmeans: EP-means Algorithm for Clustering Empirical Distributions

## Description

EP-means is a variant of k-means algorithm adapted to cluster multiple empirical cumulative distribution functions under metric structure induced by Earth Mover's Distance.

## Usage

 `1` ```epmeans(elist, k = 2) ```

## Arguments

 `elist` a length N list of either vector or `ecdf` objects. `k` the number of clusters.

## Value

a named list containing

cluster

an integer vector indicating the cluster to which each `ecdf` is allocated.

centers

a length k list of centroid `ecdf` objects.

## References



henderson_epmeans_2015maotai

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34``` ```## two sets of 1d samples, 10 each and add some noise # set 1 : mixture of two gaussians # set 2 : single gamma distribution # generate data elist = list() for (i in 1:10){ elist[[i]] = stats::ecdf(c(rnorm(100, mean=-2), rnorm(50, mean=2))) } for (j in 11:20){ elist[[j]] = stats::ecdf(rgamma(100,1) + rnorm(100, sd=sqrt(0.5))) } # run EP-means with k clusters # change the value below to see different settings myk = 2 epout = epmeans(elist, k=myk) # visualize opar = par(no.readonly=TRUE) par(mfrow=c(1,myk)) for (k in 1:myk){ idk = which(epout\$cluster==k) for (i in 1:length(idk)){ if (i<2){ pm = paste("class ",k," (size=",length(idk),")",sep="") plot(elist[[idk[i]]], verticals=TRUE, lwd=0.25, do.points=FALSE, main=pm) } else { plot(elist[[idk[i]]], add=TRUE, verticals=TRUE, lwd=0.25, do.points=FALSE) } plot(epout\$centers[[k]], add=TRUE, verticals=TRUE, lwd=2, col="red", do.points=FALSE) } } par(opar) ```

