Description Usage Arguments Details Value Note Author(s) References See Also Examples

Functions for mean shift, iterative mean shift, and mean shift clustering.
The main function is `ms`

which, for a given bandwidth, detects the local modes
(‘local principal points’) and performs the clustering.

1 2 3 |

`X` |
data matrix or vector. |

`h` |
scalar or vector-valued bandwidth (by default, 5 percent of
the data range, or 20 percent of the standard deviation, respectively, in each direction). If set manually and |

`x` |
point from which we wish to shift to the local mean. |

`subset` |
vector specifying a subset of 1:n, where n is the sample size. This allows to run the iterative mean shift procedure only from a subset of points (if unspecified, 1:n is used here, i.e. each data point serves as a starting point). |

`scaled` |
if equal to 1 (default), each variable is divided by its range, and if equal to 2 (or any other positive value other than 1), each variable is divided by its standard deviation. If equal to 0, then no scaling is applied. |

`thresh, iter` |
mean shift iterations are stopped when the
mean shift length (relative to the distance of of |

`thr` |
adjacent mean shift clusters are merged if their relative distance falls below this threshold (see Note section). |

`plot` |
if equal to 0, then no plotted output. For bivariate
data, |

`...` |
further graphical parameters. |

The methods implemented here can be used for density mode estimation, clustering, and the selection of starting points for the LPC algorithm.

Chen (1995) showed that, if the mean shift is computed iteratively, the resulting sequence of local means converges to a mode of the estimated density function. By assigning each data point to the mode to which it has converged, this turns into a clustering technique.

The concepts of coverage and self-coverage, which were originally introduced in the principal curve context, adapt straightforwardly to this setting (Einbeck, 2011).

The goodness-of-fit measure `Rc`

can also be applied in this context. For
instance, a value of *R_C=0.8* means that,
after the clustering, the mean absolute residual length has been
reduced by *80%* (compared to the distances to the overall mean).

The main function `ms`

produces an object of class `ms`

,
with components:

`cluster.center` |
a matrix which gives the coordinates of the estimated density modes (i.e., of the mean-shift based cluster centers). |

`cluster.label` |
assigns each data point to the cluster center to which its mean shift trajectory has converged. |

`closest.label` |
assigns each data point to the closest cluster center in terms of Euclidean distance. |

`data` |
the data frame (scaled if |

`scaled` |
the user-supplied value, could be boolean or numerical. |

`scaled.by` |
the data were scaled by dividing each variable through the values provided in this vector. |

For all other functions, use `names()`

.

All values provided in the output refer to the scaled data, unless `scaled=0`

or (equivalently) `scaled=FALSE`

.

The default option `scaled=1`

or `scaled=TRUE`

scales the data by dividing each variable through their range (differing from the scaling through the standard deviation as common e.g. for PCA). All other settings `scaled>0`

will scale the data by their standard deviation.

If `scaled=1`

or if no scaling is applied, then the default bandwidth setting is 5 percent of the data range in each direction. If the data are scaled through the standard deviation, then the default setting is 20 percent of the standard deviation in each direction.

The threshold `thresh`

for stopping mean shift iterations works as follows. At each iteration, we compare
the length of the mean shift, that is the Euclidean distance between the point `x`

and its local mean `m`

, to the Euclidean distance between the point `x`

and the overall data mean `M`

. If this distance falls below `thresh`

, the mean shift procedure is stopped.

The threshold `thr`

for merging cluster centers works as follows: After identification of a new cluster center, we compute the Euclidean distance of the new center to (each) existing center, relative to the Euclidean distance of the existing center to the overall mean. If this distance falls below `thr`

, then the new center is deemed identical to the old one. The default setting for the relation of the two thresholds is `thresh = thr^2`

.

J. Einbeck. See `LPCM-package`

for further
acknowledgements.

Chen, Y. (1995). Mean Shift, Mode Seeking, and Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 790-799.

Einbeck, J. (2011). Bandwidth selection for mean-shift based unsupervised learning techniques: a unified approach via self-coverage. Journal of Pattern Recognition Research 6, 175-192.

1 2 3 |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.