clustering: Data Clustering (After Data Shrinking)

Description Usage Arguments Details Value References Examples

View source: R/clustering.R

Description

Data clustering (after data shrinking).

Usage

1
clustering(y, disMethod = "Euclidean")

Arguments

y

data matrix which is an R matrix object (for dimension > 1) or vector object (for dimension=1) with rows be observations and columns be variables.

disMethod

specification of the dissimilarity measure. The available measures are “Euclidean” and “1-corr”.

Details

We first store the first observation (data point) in point[1]. We then get the nearest neighbor of point[1]. Store it in point[2]. Store the dissimilarity between point[1] and point[2] to db[1]. We next remove point[1]. We then find the nearest neighbor of point[2]. Store it in point[3]. Store the dissimilarity between point[2] and point[3] to db[2]. We then remove point[2] and find the nearest neighbor of point[3]. We repeat this procudure until we find point[n] and db[n-1] where n is the total number of data points.

Next, we calculate the interquartile range (IQR) of the vector db. We then check which elements of db are larger than avg+1.5IQR where avg is the average of the vector db. The mininum value of these outlier dissimilarities will be stored in omin. An estimate of the number of clusters is g where g-1 is the number of the outlier dissimilarities. The position of an outlier dissimilarity indicates the end of a cluster and the start of a new cluster.

To get a reasonable clustering result, data sharpening (shrinking) is recommended before data clustering.

Value

mem

vector of the cluster membership of data points. The cluster membership takes values: 1, 2, , g, where g is the estimated number of clusters.

size

vector of the number of data points for clusters.

g

an estimate of the number of clusters.

db

vector of dissimilarities between sorted consecutive data points (c.f. details).

point

vector of sorted consecutive data points (c.f. details).

omin

The minimum value of the outlier dissimilarities (c.f. details).

References

Wang, S., Qiu, W., and Zamar, R. H. (2007). CLUES: A non-parametric clustering method based on local shrinking. Computational Statistics & Data Analysis, Vol. 52, issue 1, pages 286-298.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
    # Maronna data set
    data(Maronna)
    # data matrix
    maronna <- Maronna$maronna

    tt <- shrinking(maronna, K = 50, itmax = 20)
    tt2 <- clustering(tt)

    # Plot of disimilarities between the sorted consecutive data points
    #     versus the sorted consecutive data points
    # This plot can be used to assess the estimated number of clusters
    db <- tt2$db
    point <- tt2$point
    n <- length(point)
    plot(1:(n - 1), db, type = "l",
        xlab = "sorted consecutive data points", 
        ylab = "disimilarities between the sorted consecutive data points", 
        xlim = c(0, n), axes = FALSE)
    box()
    axis(side = 2)
    axis(side = 1, at = c(0, 1:(n - 1)), labels = point)

clues documentation built on Dec. 4, 2019, 1:09 a.m.