Functions for mean shift, iterative mean shift, mean shift clustering,
and bandwidth selection for mean shift clustering (based on
self-coverage). The main function is `ms`

which, for a given
bandwidth, detects the local modes (‘local principal points’) and performs the clustering.

These functions implement the techniques presented in Einbeck (2011).

1 2 3 4 5 6 7 | ```
meanshift(X, x, h)
ms.rep(X, x, h, plotms=1, thresh= 0.00000001, iter=200)
ms(X, h, subset, thr=0.0001, scaled= TRUE, iter=200, plotms=2,
or.labels=NULL, ...)
ms.self.coverage(X, taumin=0.02, taumax=0.5, gridsize=25,
thr=0.0001, scaled=TRUE, cluster=FALSE, plot.type="o",
or.labels=NULL, print=FALSE, ...)
``` |

`X` |
data matrix. |

`h` |
bandwidth (by default, 10 percent of the data range). |

`x` |
point from which we wish to shift to the local mean. |

`subset` |
vector specifying a subset of 1:n, where n is the sample size. This allows to run the iterative mean shift procedure only from a subset of points (if unspecified, 1:n is used here, i.e. each data point serves as a starting point). |

`scaled` |
logical (if TRUE, each variable is divided by its range). |

`taumin,taumax,gridsize` |
determine the grid of bandwidths to investigate. |

`thresh, iter` |
mean shift iterations are stopped when the
mean shift length (relative to the length of |

`thr` |
adjacent mean shift clusters are merged if their relative distance falls below this threshold. |

`cluster` |
if |

`plotms, plot.type, or.labels, ...` |
graphical parameters. |

`print` |
if TRUE, coverage values are printed on the screen as soon as
computed. This is quite helpful especially if |

The methods implemented here can be used for density mode estimation, clustering, and the selection of starting points for the LPC algorithm.

Chen (1995) showed that, if the mean shift is computed iteratively, the resulting sequence of local means converges to a mode of the estimated density function. By assigning each data point to the mode to which it has converged, this turns into a clustering technique.

The concepts of coverage and self-coverage, which were originally introduced in the principal curve context, adapt straightforwardly to this setting.

The goodness-of-fit messure `Rc`

can also be applied in this context. For
instance, a value of *R_C=0.8* means that,
after the clustering, the mean absolute residual length has been
reduced by *80%* (compared to the distances to the overall mean).

The main function `ms`

produces an object of class `ms`

,
with components:

`cluster.center` |
a matrix which gives the coordinates of the estimated density modes (i.e., of the mean-shift based cluster centers). |

`cluster.label` |
assigns each data point to the cluster center to which its mean shift trajectory has converged. |

`closest.label` |
assigns each data point to the closest cluster center in terms of Euclidean distance. |

`data` |
the data frame (scaled if |

`scaled` |
boolean. |

`scaled.by` |
the data were scaled by dividing each variable through the values provided in this vector. |

For all other functions, use `names()`

.

J. Einbeck. See `LPCM-package`

for further
acknowledgements.

Chen, Y. (1995). Mean Shift, Mode Seeking, and Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 790-799.

Einbeck, J. (2011). Bandwidth selection for mean-shift based unsupervised learning techniques: a unified approach via self-coverage. Journal of Pattern Recognition Research 6, 175-192.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ```
data(faithful)
# Mean shift clustering with user-defined bandwidth (5 percent of data range)
fit <- ms(faithful, h=0.05)
# Goodness-of-fit
coverage(fit$data, fit$cluster.center)
Rc(fit)
# Bandwidth selection via self-coverage
## Not run: foo <- ms.self.coverage(faithful,gridsize= 50, taumin=0.1,
taumax=0.5, plot.type="o")
h <- select.self.coverage(foo)$select
fit <- ms(faithful,h=h[1])
## End(Not run)
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.