clustering: Data Clustering (After Data Shrinking)
In clues: Clustering Method Based on Local

Description Usage Arguments Details Value References Examples

Data clustering (after data shrinking).

1	clustering(y, disMethod = "Euclidean")

`y`	data matrix which is an R matrix object (for dimension > 1) or vector object (for dimension=1) with rows be observations and columns be variables.
`disMethod`	specification of the dissimilarity measure. The available measures are “Euclidean” and “1-corr”.

We first store the first observation (data point) in point[1]. We then get the nearest neighbor of point[1]. Store it in point[2]. Store the dissimilarity between point[1] and point[2] to db[1]. We next remove point[1]. We then find the nearest neighbor of point[2]. Store it in point[3]. Store the dissimilarity between point[2] and point[3] to db[2]. We then remove point[2] and find the nearest neighbor of point[3]. We repeat this procudure until we find point[n] and db[n-1] where n is the total number of data points.

Next, we calculate the interquartile range (IQR) of the vector db. We then check which elements of db are larger than avg+1.5IQR where avg is the average of the vector db. The mininum value of these outlier dissimilarities will be stored in omin. An estimate of the number of clusters is g where g-1 is the number of the outlier dissimilarities. The position of an outlier dissimilarity indicates the end of a cluster and the start of a new cluster.

To get a reasonable clustering result, data sharpening (shrinking) is recommended before data clustering.

`mem`	vector of the cluster membership of data points. The cluster membership takes values: 1, 2, …, g, where g is the estimated number of clusters.
`size`	vector of the number of data points for clusters.
`g`	an estimate of the number of clusters.
`db`	vector of dissimilarities between sorted consecutive data points (c.f. details).
`point`	vector of sorted consecutive data points (c.f. details).
`omin`	The minimum value of the outlier dissimilarities (c.f. details).

Wang, S., Qiu, W., and Zamar, R. H. (2007). CLUES: A non-parametric clustering method based on local shrinking. Computational Statistics & Data Analysis, Vol. 52, issue 1, pages 286-298.

    # Maronna data set
    data(Maronna)
    # data matrix
    maronna <- Maronna$maronna

    tt <- shrinking(maronna, K = 50, itmax = 20)
    tt2 <- clustering(tt)

    # Plot of disimilarities between the sorted consecutive data points
    #     versus the sorted consecutive data points
    # This plot can be used to assess the estimated number of clusters
    db <- tt2$db
    point <- tt2$point
    n <- length(point)
    plot(1:(n - 1), db, type = "l",
        xlab = "sorted consecutive data points", 
        ylab = "disimilarities between the sorted consecutive data points", 
        xlim = c(0, n), axes = FALSE)
    box()
    axis(side = 2)
    axis(side = 1, at = c(0, 1:(n - 1)), labels = point)