Mean-shift mode seeking algorithm

Share:

Description

Given a sample from a d-dimensional distribution, an initialization point and a bandwidth the algorithm finds the nearest mode of the corresponding Gaussian kernel density.

Usage

1
ms(X, init, H, tol = 1e-06, ncores = 1, store = FALSE)

Arguments

X

n by d matrix containing the data.

init

d-dimensional vector containing the initial point for the optimization. By default it is equal to colMeans(X).

H

Positive definite bandwidth matrix representing the covariance of each component of the Gaussian kernel density.

tol

Tolerance used to assess the convergence of the algorithm, which is stopped if the absolute values of increments along all the dimensions are smaller then tol at any iteration. Default value is 1e-6.

ncores

Number of cores used. The parallelization will take place only if OpenMP is supported.

store

If FALSE only the latest iteration is returned, if TRUE the function will return a matrix where the i-th row is the position of the algorithms at the i-th iteration.

Value

A list where estim is a d-dimensional vector containing the last position of the algorithm, while traj is a matrix with d-colums representing the trajectory of the algorithm along each dimension. If store == FALSE the whole trajectory is not stored and traj = NULL.

Author(s)

Matteo Fasiolo <matteo.fasiolo@gmail.com>.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
set.seed(434)

# Simulating multivariate normal data
N <- 1000
mu <- c(1, 2)
sigma <- matrix(c(1, 0.5, 0.5, 1), 2, 2)
X <- rmvn(N, mu = mu, sigma = sigma)

# Plotting the true density function
steps <- 100
range1 <- seq(min(X[ , 1]), max(X[ , 1]), length.out = steps)
range2 <- seq(min(X[ , 2]), max(X[ , 2]), length.out = steps)
grid <- expand.grid(range1, range2)
vals <- dmvn(as.matrix(grid), mu, sigma)

contour(z = matrix(vals, steps, steps),  x = range1, y = range2, xlab = "X1", ylab = "X2")
points(X[ , 1], X[ , 2], pch = '.')
 
# Estimating the mode from "nrep" starting points
nrep <- 10
index <- sample(1:N, nrep)
for(ii in 1:nrep) {
  start <- X[index[ii], ]
  out <- ms(X, init = start, H = 0.1 * sigma, store = TRUE)
  lines(out$traj[ , 1], out$traj[ , 2], col = 2, lwd = 2) 
  points(out$final[1], out$final[2], col = 4, pch = 3, lwd = 3) # Estimated mode (blue)
  points(start[1], start[2], col = 2, pch = 3, lwd = 3)         # ii-th starting value 
}