robust_init_density: Robust initialization based on inverse density estimator

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/robust_init_density.R

Description

robust_init_density searches for k initial cluster seeds for k-means-based clustering methods.

Usage

1
robust_init_density(dist_matrix, data, n_clusters, mp = 10, method = "density")

Arguments

dist_matrix

A distance matrix calculated on data.

data

A data matrix with n rows and p columns.

mp

The number of the nearest neighbors to find dense regions by LOF, the default is 10.

k

The number of cluster centers to find.

Details

This function do the same as ROBIN but taking into account the density (instead of the inverse of the average relative local density known as LOF)

The centers are the observations located in the most dense region and far away from each other at the same time. In order to find the observations in the highly dense region, ROBINPOINTDEN uses point density estimation (instead of Local Outlier Factor, Breunig et al. (2000)), see more details.

Observation: Outliers have a high 'idp' value. In imbalanced cases and when K increases, all the observations from a group might be above the critRobin, So we need to increase the critRobin in order to avoid two initials centers from the same group. modification: start with a point whose density is maximum

Value

centers

A numeric vector of k initial cluster centers corresponding to the k indices of observations.

idpoints

A real vector containing the inverse density values of each point (observation).

Note

this is a slightly modified version of ROBIN algorithm implementation done by Sarka Brodinova <sarka.brodinova@tuwien.ac.at>.

Author(s)

Juan Domingo Gonzalez <juanrst@hotmail.com>

References

Hasan AM, et al. Robust partitional clustering by outlier and density insensitive seeding. Pattern Recognition Letters, 30(11), 994-1002, 2009.

See Also

lof

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
K=5;
nk=100
Z <- rnorm(2 * K * nk);
centers_aux <- -floor(K/2):floor(K/2)
mues <- rep(5*centers_aux,2*nk*K )
X <-  matrix(Z + mues, ncol=2)
# Generate synthetic outliers (contamination level 20%)
X[sample(1:(nk * K),(nk * K) * 0.2), ] <-matrix(runif((nk * K) * 0.2 * 2,
                                          3 * min(X), 3 * max(X)),
                                          ncol = 2, nrow = (nk * K) * 0.2)
res <- robust_init_density(dist_matrix =dist(X), data=X, k = K);
# plot the Initial centers found
plot(X)
points(X[res$centers,],pch=19,col=4,cex=2)

anevolbap/ktaucenterscpp documentation built on March 10, 2021, 10:12 a.m.