INCAnumclu | R Documentation |
INCAnumclu
helps to estimate the number of clusters in a
dataset. The INCA index associated to different partitions with
different number of clusters is calculated.
INCAnumclu(d, K, method = "pam", pert, L= NULL, noise=NULL)
d |
a distance matrix or a |
K |
the maximum number of cluster to be considered. For each k value ( k=2,..,K) a partition with k clusters is calculated. |
method |
character string defining the clustering method in
order to obtain the partitions. The hierarchical aglomerative clustering methods are perfomed via |
pert |
only useful when parameter |
L |
default value NULL, but when some units are considered by
the user as noise units, |
noise |
when |
Returns an object of class incanc
which is a numeric vector containing the INCA index associated to each of the k (k=2,...,K) partitions. When noise
is no null, the function returns a list with the INCA index for each partition, which is calculated without noise units as well as with noise units. The associated plot
returns INCA index plot, both, with and without noise.
Itziar Irigoien itziar.irigoien@ehu.eus; Konputazio Zientziak eta Adimen Artifiziala, Euskal Herriko Unibertsitatea (UPV/EHU), Donostia, Spain.
Conchita Arenas carenas@ub.edu; Departament d'Estadistica, Universitat de Barcelona, Barcelona, Spain.
Irigoien, I. and Arenas, C. (2008). INCA: New statistic for estimating the number of clusters and identifying atypical units. Statistics in Medicine, 27(15), 2948–2973.
Arenas, C. and Cuadras, C.M. (2002). Some recent statistical methods based on distances. Contributions to Science, 2, 183–191.
INCAindex
, estW
#------- Example 1 -------------------------------------- #generate 3 clusters, each of them with 20 objects in dimension 5. mu1 <- sample(1:10, 5, replace=TRUE) x1 <- matrix(rnorm(20*5, mean = mu1, sd = 1),ncol=5, byrow=TRUE) mu2 <- sample(1:10, 5, replace=TRUE) x2 <- matrix(rnorm(20*5, mean = mu2, sd = 1),ncol=5, byrow=TRUE) mu3 <- sample(1:10, 5, replace=TRUE) x3 <- matrix(rnorm(20*5, mean = mu3, sd = 1),ncol=5, byrow=TRUE) x <- rbind(x1,x2,x3) # calculte euclidean distance between them d <- dist(x) # calculate the INCA index associated to partitions with k=2, ..., k=5 clusters. INCAnumclu(d, K=5) out <- INCAnumclu(d, K=5) plot(out) #------- Example 1 cont. -------------------------------- # With hypothetical noise elements noiseunits <- rep(FALSE, 60) noiseunits[sample(1:60, 20)] <- TRUE out <- INCAnumclu(d, K=5, L="custom", noise=noiseunits) plot(out)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.