Description Usage Arguments Details Value Author(s) References See Also Examples
This function implements the mean shift algorithm. The algorithm locates the modes of a kernel density estimator and associates each data point to exactly one of the modes, thus effectively clustering the data.
1 2 | msClustering(X, h = NULL, kernel = "epanechnikovKernel",
tol.stop = 1e-06, tol.epsilon = 0.001, multi.core = FALSE)
|
X |
a p \times n matrix containing n ≥ 1 p-dimensional numeric vectors stored as columns. Each column of |
h |
a strictly positive bandwidth parameter. |
kernel |
a kernel function (as a character string). The following kernels are supported:
|
tol.stop |
a strictly positive tolerance parameter. The algorithm stops when all of the updates generate steps of length smaller than |
tol.epsilon |
a strictly positive tolerance parameter. Points that are less than |
multi.core |
logical. If |
It is generally recommended to standardize X
so that each variable has
unit variance prior to running the algorithm on the data.
Roughly speaking, larger values of h
produce a coarser clustering (i.e. few and large clusters). For sufficiently large values of h
, the algorithm produces a unique cluster containing all the data points. Smaller values of h
produce a finer clustering (i.e. many small clusters). For sufficiently small values of h
, each cluster that is identified by the algorithm will contain exactly one data point.
If h
is not specified in the function call, then h
is by default set to the 30th percentile of the empirical distribution of distances between the columns of X
, i.e. h=quantile( dist( t( X ) ), 0.3 )
.
In their implementation, gaussianKernel
and exponentialKernel
are rescaled to assign probability of at least 0.99 to the unit interval [0,1]. This ensures that all the kernels are roughly on the same scale.
To specify the number of cores when multi.core=TRUE
, the option
mc.cores
needs to be set with options( mc.cores=n.cores )
, where
n.cores
is the number of cores that the mean shift algorithm is allowed to use for parallel computation.
The function invisibly returns a list with names
components |
a matrix containing the modes/cluster representatives by column. |
labels |
an integer vector of cluster labels. |
Mattia Ciollaro and Daren Wang
Carreira-Perpinan, M. A. (2015) A review of mean-shift algorithms for clustering. arXiv http://arxiv.org/abs/1503.00687
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | ## an example using the iris dataset
## help( iris )
## prepare data matrix (a subset of the iris dataset)
set.seed( 2 )
indices <- sample( 1:nrow( iris ), 80 )
iris.data <- t( iris[indices,c( "Sepal.Length", "Sepal.Width" )] )
## run mean shift algorithm
clustering <- msClustering( iris.data, h=0.8 )
print( clustering )
## plot the clusters
## Not run:
plot( iris.data[1,], iris.data[2,], col=clustering$labels+2, cex=0.8,
pch=16, xlab="Sepal.Length", ylab="Sepal.Width" )
points( clustering$components[1,], clustering$components[2,],
col=2+( 1:ncol( clustering$components ) ), cex=1.8, pch=16 )
## End(Not run)
## using multiple cores (2)
## Not run:
options( mc.cores=2 )
clustering.mc <- msClustering( iris.data, multi.core=TRUE )
## End(Not run)
|
Loading required package: parallel
Loading required package: wavethresh
Loading required package: MASS
WaveThresh: R wavelet software, release 4.6.8, installed
Copyright Guy Nason and others 1993-2016
Note: nlevels has been renamed to nlevelsWT
Running mean-shift algorithm...
|
| | 0%
|
|= | 1%
|
|== | 2%
|
|=== | 4%
|
|==== | 5%
|
|==== | 6%
|
|===== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======== | 11%
|
|========= | 12%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 16%
|
|============ | 18%
|
|============= | 19%
|
|============== | 20%
|
|=============== | 21%
|
|================ | 22%
|
|================= | 24%
|
|================== | 25%
|
|================== | 26%
|
|=================== | 28%
|
|==================== | 29%
|
|===================== | 30%
|
|====================== | 31%
|
|======================= | 32%
|
|======================== | 34%
|
|======================== | 35%
|
|========================= | 36%
|
|========================== | 38%
|
|=========================== | 39%
|
|============================ | 40%
|
|============================= | 41%
|
|============================== | 42%
|
|=============================== | 44%
|
|================================ | 45%
|
|================================ | 46%
|
|================================= | 48%
|
|================================== | 49%
|
|=================================== | 50%
|
|==================================== | 51%
|
|===================================== | 52%
|
|====================================== | 54%
|
|====================================== | 55%
|
|======================================= | 56%
|
|======================================== | 58%
|
|========================================= | 59%
|
|========================================== | 60%
|
|=========================================== | 61%
|
|============================================ | 62%
|
|============================================= | 64%
|
|============================================== | 65%
|
|============================================== | 66%
|
|=============================================== | 68%
|
|================================================ | 69%
|
|================================================= | 70%
|
|================================================== | 71%
|
|=================================================== | 72%
|
|==================================================== | 74%
|
|==================================================== | 75%
|
|===================================================== | 76%
|
|====================================================== | 78%
|
|======================================================= | 79%
|
|======================================================== | 80%
|
|========================================================= | 81%
|
|========================================================== | 82%
|
|=========================================================== | 84%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 88%
|
|============================================================== | 89%
|
|=============================================================== | 90%
|
|================================================================ | 91%
|
|================================================================= | 92%
|
|================================================================== | 94%
|
|================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 100%
Mean-shift algorithm ran successfully.
Finding clusters...
The algorithm found 2 clusters.
$components
mode1 mode2
Sepal.Length 5.003879 6.327882
Sepal.Width 3.376633 2.999923
$labels
[1] 1 2 2 1 2 2 1 2 2 2 2 1 2 2 2 2 2 1 2 1 2 1 2 1 1 2 2 2 2 1 1 2 2 2 2 2 2 1
[39] 2 2 2 2 1 1 2 2 2 1 2 2 1 2 2 2 1 2 2 2 2 2 2 2 2 1 2 1 2 1 1 1 2 2 1 1 1 2
[77] 1 2 1 1
Running mean-shift algorithm...
Using 2 cores...
Mean-shift algorithm ran successfully.
Finding clusters...
The algorithm found 3 clusters.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.