bmsClustering: Function to perform clustering using the blurring version of...
In mattiaciollaro/MeanShift: Clustering via the Mean Shift Algorithm

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/blurredmeanshift.R

This function implements the blurring mean shift algorithm, which approximates the standard mean shift algorithm. Because it recursively updates the entire sample at each iteration, the blurring version of the mean shift algorithm is often faster than the standard version (especially if the standard mean shift algorithm is run using a single core).

1 2	bmsClustering(X, h = NULL, kernel = "epanechnikovKernel", tol.stop = 1e-06, max.iter = 100, tol.epsilon = 0.001)

`X`	a p \times n matrix containing n ≥ 1 p-dimensional numeric vectors stored as columns. Each column of `X` represents a sample point.
`h`	a strictly positive bandwidth parameter.
`kernel`	a kernel function (as a character string). The following kernels are supported: Epanechnikov: K(x) = \frac{3}{2}(1-x^2)I_{[0,1]}(x) ; `kernel="epanechnikovKernel"` cubic: K(x) = 4(1-x)^3I_{[0,1]}(x) ; `kernel="cubicKernel"` Gaussian: K(x) = √{\frac{2}{π}}e^{-\frac{x^2}{2}}I_{[0,∞)}(x) ; `kernel="gaussianKernel"` exponential K(x) = e^{-x}I_{[0,∞)}(x) ; `kernel="exponentialKernel"`. The use of the Epanechnikov kernel is recommended when using the blurring version of the mean shift algorithm.
`tol.stop`	a strictly positive tolerance parameter. The mean shift algorithm stops when its update generates a step of length smaller than `tol.stop`. `tol.stop` should be considerably smaller than `tol.epsilon`.
`max.iter`	a strictly positive integer specifying the maximum number of iterations before the algorithm is forced to stop.
`tol.epsilon`	a strictly positive tolerance parameter. Points that are less than `tol.epsilon`- separated are grouped in the same cluster once the algorithm stops.

It is generally recommended to standardize X so that each variable has unit variance prior to running the algorithm on the data.

Roughly speaking, larger values of h produce a coarser clustering (i.e. few and large clusters). For sufficiently large values of h, the algorithm produces a unique cluster containing all the data points. Smaller values of h produce a finer clustering (i.e. many small clusters). For sufficiently small values of h, each cluster that is identified by the algorithm will contain exactly one data point.

If h is not specified in the function call, then h is by default set to the 30th percentile of the empirical distribution of distances between the columns of X, i.e. h=quantile( dist( t( X ) ), 0.3 ).

In their implementation, gaussianKernel and exponentialKernel are rescaled to assign probability of at least 0.99 to the unit interval [0,1]. This ensures that all the kernels are roughly on the same scale.

When using the blurring version of the mean shift algorithm, it is generally recommended to use a compactly supported kernel. In particular, the algorithm is guaranteed to converge in finitely many iterations with the Epanechnikov kernel.

The function invisibly returns a list with names

`components`	a matrix containing the modes/cluster representatives by column.
`labels`	an integer vector of cluster labels.

Mattia Ciollaro and Daren Wang.

Carreira-Perpinan, M. A. (2015) A review of mean-shift algorithms for clustering. arXiv http://arxiv.org/abs/1503.00687

msClustering

## an example using the iris dataset
## help( iris )

## prepare data matrix
iris.data <- t( iris[,c( "Sepal.Length", "Sepal.Width" )] )

## run blurring mean shift algorithm
clustering <- bmsClustering( iris.data )
print( clustering )

## plot the clusters
## Not run: 
plot( iris.data[1,], iris.data[2,], col=clustering$labels+2, cex=0.8,
pch=16, xlab="Sepal.Length", ylab="Sepal.Width" )
points( clustering$components[1,], clustering$components[2,],
col=2+( 1:ncol( clustering$components ) ), cex=1.8, pch=16 )
## End(Not run)