Description Usage Arguments Details Value References Examples
Data clustering (after data shrinking).
1 | clustering(y, disMethod = "Euclidean")
|
y |
data matrix which is an R matrix object (for dimension > 1) or vector object (for dimension=1) with rows be observations and columns be variables. |
disMethod |
specification of the dissimilarity measure. The available measures are “Euclidean” and “1-corr”. |
We first store the first observation (data point) in point[1]
.
We then get the nearest neighbor of point[1]
. Store it in
point[2]
. Store the dissimilarity between point[1]
and
point[2]
to db[1]
. We next remove point[1]
.
We then find the nearest neighbor of point[2]
.
Store it in point[3]
. Store the dissimilarity between point[2]
and point[3]
to db[2]
. We then remove point[2]
and find the nearest neighbor of point[3]
. We repeat this procudure
until we find point[n]
and db[n-1]
where n
is the
total number of data points.
Next, we calculate the interquartile range (IQR) of the vector db
.
We then check which elements of db
are larger than avg+1.5IQR
where avg
is the average of the vector db
. The mininum value of
these outlier dissimilarities will be stored in omin
.
An estimate of the number of clusters is g
where g-1
is the number
of the outlier dissimilarities.
The position of an outlier dissimilarity
indicates the end of a cluster and the start of a new cluster.
To get a reasonable clustering result, data sharpening (shrinking) is recommended before data clustering.
mem |
vector of the cluster membership of data points. The cluster membership takes values: 1, 2, …, g, where g is the estimated number of clusters. |
size |
vector of the number of data points for clusters. |
g |
an estimate of the number of clusters. |
db |
vector of dissimilarities between sorted consecutive data points (c.f. details). |
point |
vector of sorted consecutive data points (c.f. details). |
omin |
The minimum value of the outlier dissimilarities (c.f. details). |
Wang, S., Qiu, W., and Zamar, R. H. (2007). CLUES: A non-parametric clustering method based on local shrinking. Computational Statistics & Data Analysis, Vol. 52, issue 1, pages 286-298.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | # Maronna data set
data(Maronna)
# data matrix
maronna <- Maronna$maronna
tt <- shrinking(maronna, K = 50, itmax = 20)
tt2 <- clustering(tt)
# Plot of disimilarities between the sorted consecutive data points
# versus the sorted consecutive data points
# This plot can be used to assess the estimated number of clusters
db <- tt2$db
point <- tt2$point
n <- length(point)
plot(1:(n - 1), db, type = "l",
xlab = "sorted consecutive data points",
ylab = "disimilarities between the sorted consecutive data points",
xlim = c(0, n), axes = FALSE)
box()
axis(side = 2)
axis(side = 1, at = c(0, 1:(n - 1)), labels = point)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.