riem.nmshift: Nonlinear Mean Shift
In Riemann: Learning with Data on Riemannian Manifolds

riem.nmshift

R Documentation

Nonlinear Mean Shift

Description

Given N observations X_1, X_2, …, X_N \in \mathcal{M}, perform clustering of the data based on the nonlinear mean shift algorithm. Gaussian kernel is used with the bandwidth h as of

G(x_i, x_j) \propto \exp ≤ft( - \frac{ρ^2 (x_i,x_j)}{h^2} \right)

where ρ(x,y) is geodesic distance between two points x,y\in\mathcal{M}. Numerically, some of the limiting points that collapse into the same cluster are not exact. For such purpose, we require maxk parameter to search the optimal number of clusters based on k-medoids clustering algorithm in conjunction with silhouette criterion.

Usage

riem.nmshift(riemobj, h = 1, maxk = 5, maxiter = 50, eps = 1e-05)

Arguments

`riemobj`	a S3 `"riemdata"` class for N manifold-valued data.
`h`	bandwidth parameter. The larger the h is, the more blurring is applied.
`maxk`	maximum number of clusters to determine the optimal number of clusters.
`maxiter`	maximum number of iterations to be run.
`eps`	tolerance level for stopping criterion.

Value

a named list containing

distance: an (N\times N) distance between modes corresponding to each data point.
cluster: a length-N vector of class labels.

References

\insertRef

subbarao_nonlinear_2009Riemann

Examples

#-------------------------------------------------------------------
#          Example on Sphere : a dataset with three types
#
# class 1 : 10 perturbed data points near (1,0,0) on S^2 in R^3
# class 2 : 10 perturbed data points near (0,1,0) on S^2 in R^3
# class 3 : 10 perturbed data points near (0,0,1) on S^2 in R^3
#-------------------------------------------------------------------
## GENERATE DATA
set.seed(496)
ndata  = 10
mydata = list()
for (i in 1:ndata){
  tgt = c(1, stats::rnorm(2, sd=0.1))
  mydata[[i]] = tgt/sqrt(sum(tgt^2))
}
for (i in (ndata+1):(2*ndata)){
  tgt = c(rnorm(1,sd=0.1),1,rnorm(1,sd=0.1))
  mydata[[i]] = tgt/sqrt(sum(tgt^2))
}
for (i in ((2*ndata)+1):(3*ndata)){
  tgt = c(stats::rnorm(2, sd=0.1), 1)
  mydata[[i]] = tgt/sqrt(sum(tgt^2))
}
myriem = wrap.sphere(mydata)
mylabs = rep(c(1,2,3), each=ndata)

## RUN NONLINEAR MEANSHIFT FOR DIFFERENT 'h' VALUES
run1 = riem.nmshift(myriem, maxk=10, h=0.1)
run2 = riem.nmshift(myriem, maxk=10, h=1)
run3 = riem.nmshift(myriem, maxk=10, h=10)

## MDS FOR VISUALIZATION
mds2d = riem.mds(myriem, ndim=2)$embed

## VISUALIZE
opar <- par(no.readonly=TRUE)
par(mfrow=c(2,3), pty="s")
plot(mds2d, pch=19, main="label : h=0.1", col=run1$cluster)
plot(mds2d, pch=19, main="label : h=1",   col=run2$cluster)
plot(mds2d, pch=19, main="label : h=10",  col=run3$cluster)
image(run1$distance[,30:1], axes=FALSE, main="distance : h=0.1")
image(run2$distance[,30:1], axes=FALSE, main="distance : h=1")
image(run3$distance[,30:1], axes=FALSE, main="distance : h=10")
par(opar)

Riemann documentation built on March 18, 2022, 7:55 p.m.