clues: Clustering Method Based on Local Shrinking

Description Usage Arguments Value Note References Examples

View source: R/clues.R

Description

Automatically estimate the number of clusters for a given data set and get a partition.

Usage

1
2
3
clues(y, n0 = 5, alpha = 0.05, eps = 1.0e-4, itmax = 20, 
    K2.vec = NULL, strengthMethod = "sil", strengthIni = -3, 
    disMethod ="Euclidean", quiet = TRUE)

Arguments

y

data matrix which is an R matrix object (for dimension > 1) or vector object (for dimension=1) with rows being observations and columns being variables.

n0

a guess for the number of clusters.

alpha

speed factor.

eps

a small positive number. A value is regarded as zero if it is less than eps.

itmax

maximum number of iterations allowed.

K2.vec

range for the number of nearest neighbors for the second pass of the iteration.

strengthMethod

specifies the prefered measure of the strength of the clusters (i.e., compactness of the clusters). Two available methods are “sil” (Silhouette index) and “CH” (CH index).

strengthIni

initial value for the lower bound of the measure of the strength for the clusters. Any negative values will do.

disMethod

specification of the dissimilarity measure. The available measures are “Euclidean” and “1-corr”.

quiet

logical. Indicates if intermediate results should be output.

Value

K

number of nearest neighbors can be used to get final clustering.

size

vector of the number of data points for clusters.

mem

vector of the cluster membership of data points. The cluster membership takes values: 1, 2, , g, where g is the estimated number of clusters.

g

an estimate of the number of clusters.

CH

CH index value for the final partition if strengthMethod is “CH”.

avg.s

average of the Silhoutte index value for the final partition if strengthMethod is “sil”.

s

vector of Silhoutte indices for data points if strengthMethod is “sil”.

K.vec

number of nearest neighbors used for each iteration.

g.vec

number of clusters obtained in each iteration.

myupdate

logical. Indicates if the partition obtained in the first pass is the same as that obtained in the second pass.

y.old1

data used for shrinking and clustering.

y.old2

data returned after shrinking and clustering.

y

a copy of the data from the input.

strengthMethod

a copy of the strengthMethod from the input.

disMethod

a copy of the dissimilarity measure from the input

Note

Occasionally, the number of clusters estimated by clues will be equal to the number of data points (that is, each data point forms a cluster). In this case, the estimated number of clusters was set to be equal to one. And the CH index or Silhouette index will be set to be equal to NULL since CH index and Silhouette index are not defined when the number of clusters is equal to one.

References

Wang, S., Qiu, W., and Zamar, R. H. (2007). CLUES: A non-parametric clustering method based on local shrinking. Computational Statistics & Data Analysis, Vol. 52, issue 1, pages 286-298.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
    # Maronna data set
    data(Maronna)
    # data matrix
    maronna <- Maronna$maronna

    # partition by clues
    res <- clues(maronna, quiet = TRUE)

    # get summary statistics
    summary(res)

    # scatter plots and plot of trajectories
    ## Not run: plot(res)

Example output

Number of data points:
[1] 200

Number of variables:
[1] 2

Number of clusters:
[1] 4

Cluster sizes:
[1] 53 47 50 50

Strength method:
[1] "sil"

avg Silhouette:
[1] 0.5736749

dissimilarity measurement:
[1] "Euclidean"


Available components:
 [1] "K"              "size"           "mem"            "g"             
 [5] "avg.s"          "s"              "K.vec"          "g.vec"         
 [9] "myupdate"       "y.old1"         "y.old2"         "y"             
[13] "strengthMethod" "disMethod"     

clues documentation built on Dec. 4, 2019, 1:09 a.m.