Description Usage Arguments Value References Examples
EP-means is a variant of k-means algorithm adapted to cluster multiple empirical cumulative distribution functions under metric structure induced by Earth Mover's Distance.
1 | epmeans(elist, k = 2)
|
elist |
a length N list of either vector or |
k |
the number of clusters. |
a named list containing
an integer vector indicating the cluster to which each ecdf
is allocated.
a length k list of centroid ecdf
objects.
henderson_ep-means:_2015T4ecdf
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | ## 3 sets of 1d samples, 10 each and add some noise
# set 1 : mixture of two gaussians
# set 2 : single gamma distribution
# set 3 : mixture of gaussian and gamma
# generate data
myn = 50
elist = list()
for (i in 1:10){
elist[[i]] = stats::ecdf(c(rnorm(myn, mean=-2), rnorm(myn, mean=2)))
}
for (i in 11:20){
elist[[i]] = stats::ecdf(rgamma(2*myn,1))
}
for (i in 21:30){
elist[[i]] = stats::ecdf(rgamma(myn,1) + rnorm(myn, mean=3))
}
# run EP-means with k clusters with k=2,3,4
ep2 = epmeans(elist, k=2)
ep3 = epmeans(elist, k=3)
ep4 = epmeans(elist, k=4)
# run EP-means with k=3 clusters
epout = epmeans(elist, k=3)
# 2d embedding using mds
dmat = T4ecdf::pdist(elist, type="wasserstein", as.dist=TRUE)
ebd2 = stats::cmdscale(dmat, 2)
## visualize
# (1) show ECDF for three types of data
opar = par(mfrow=c(3,3))
plot(elist[[10]], cex=0.1, main="2 Gaussians")
plot(elist[[20]], cex=0.1, main="Gamma")
plot(elist[[30]], cex=0.1, main="Gaussian+Gamma")
# (2) per-class ECDFs
for (k in 1:myk){
idk = which(epout$cluster==k)
for (i in 1:length(idk)){
if (i<2){
pm = paste("class ",k," (size=",length(idk),")",sep="")
plot(elist[[idk[i]]], verticals=TRUE, lwd=0.25, do.points=FALSE, main=pm)
} else {
plot(elist[[idk[i]]], add=TRUE, verticals=TRUE, lwd=0.25, do.points=FALSE)
}
plot(epout$centers[[k]], add=TRUE, verticals=TRUE, lwd=2, col="red", do.points=FALSE)
}
}
# (3) 2d embedding colored class labels
plot(ebd2, col=ep2$cluster, main="k=2 means", pch=19)
plot(ebd2, col=ep3$cluster, main="k=3 means", pch=19)
plot(ebd2, col=ep4$cluster, main="k=4 means", pch=19)
par(opar)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.