index.KL: Calculates Krzanowski-Lai index

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/index.KL.r

Description

Calculates Krzanowski-Lai index

Usage

1
index.KL (x,clall,d=NULL,centrotypes="centroids")

Arguments

x

data

clall

Three vectors of integers indicating the cluster to which each object is allocated in partition of n objects into u-1, u, and u+1 clusters

d

optional distance matrix, used for calculations if centrotypes="medoids"

centrotypes

"centroids" or "medoids"

Details

See file ../doc/indexKL_details.pdf for further details

Value

Krzanowski-Lai index

Author(s)

Marek Walesiak marek.walesiak@ue.wroc.pl, Andrzej Dudek andrzej.dudek@ue.wroc.pl

Department of Econometrics and Computer Science, University of Economics, Wroclaw, Poland http://keii.ue.wroc.pl/clusterSim/

References

Krzanowski, W.J., Lai, Y.T. (1988), A criterion for determining the number of groups in a data set using sum of squares clustering, "Biometrics", 44, 23-34.

Milligan, G.W., Cooper, M.C. (1985), An examination of procedures of determining the number of cluster in a data set, "Psychometrika", vol. 50, no. 2, 159-179. Available at: doi: 10.1007/BF02294245.

Tibshirani, R., Walther, G., Hastie, T. (2001), Estimating the number of clusters in a data set via the gap statistic, "Journal of the Royal Statistical Society", ser. B, vol. 63, part 2, 411-423. Available at: doi: 10.1111/1467-9868.00293.

See Also

index.G1, index.G2, index.G3, index.C, index.S, index.H, index.Gap, index.DB

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# Example 1
library(clusterSim)
data(data_ratio)
cl1<-pam(data_ratio,4)
cl2<-pam(data_ratio,5)
cl3<-pam(data_ratio,6)
clall<-cbind(cl1$clustering,cl2$clustering,cl3$clustering)
index.KL(data_ratio,clall)

# Example 2
library(clusterSim)
data(data_ratio)
md <- dist(data_ratio, method="manhattan")
# nc - number_of_clusters
min_nc=2
max_nc=15
res <- array(0, c(max_nc-min_nc+1, 2))
res[,1] <- min_nc:max_nc
clusters <- NULL
for (nc in min_nc:max_nc)
{
  if(nc-1==1){
    clustering1<-rep(1,nrow(data_ratio))
  }
  else{
    clustering1 <- pam(md, nc-1, diss=TRUE)$clustering
  }
  clustering2 <- pam(md, nc, diss=TRUE)$clustering
  clustering3 <- pam(md, nc+1, diss=TRUE)$clustering
  clall<- cbind(clustering1, clustering2, clustering3)
  res[nc-min_nc+1,2] <- KL <- index.KL(data_ratio,clall,centrotypes="centroids")
  clusters <- rbind(clusters, clustering2)
} 
print(paste("max KL for",(min_nc:max_nc)[which.max(res[,2])],"clusters=",max(res[,2])))
print("clustering for max KL")
print(clusters[which.max(res[,2]),])
#write.table(res,file="KL_res.csv",sep=";",dec=",",row.names=TRUE,col.names=FALSE)
plot(res,type="p",pch=0,xlab="Number of clusters",ylab="KL",xaxt="n")
axis(1, c(min_nc:max_nc))

# Example 3
library(clusterSim)
data(data_ratio)
md <- dist(data_ratio, method="manhattan")
# nc - number_of_clusters
min_nc=2
max_nc=15
res <- array(0, c(max_nc-min_nc+1, 2))
res[,1] <- min_nc:max_nc
clusters <- NULL
for (nc in min_nc:max_nc)
{
  if(nc-1==1){
    clustering1<-rep(1,nrow(data_ratio))
  }
  else{
    clustering1 <- pam(md, nc-1, diss=TRUE)$clustering
  }
  clustering2 <- pam(md, nc, diss=TRUE)$clustering
  clustering3 <- pam(md, nc+1, diss=TRUE)$clustering
  clall<- cbind(clustering1, clustering2, clustering3)
  res[nc-min_nc+1,2] <- KL <- index.KL(data_ratio,clall,d=md,centrotypes="medoids")
  clusters <- rbind(clusters, clustering2)
} 
print(paste("max KL for",(min_nc:max_nc)[which.max(res[,2])],"clusters=",max(res[,2])))
print("clustering for max KL")
print(clusters[which.max(res[,2]),])
#write.table(res,file="KL_res.csv",sep=";",dec=",",row.names=TRUE,col.names=FALSE)
plot(res,type="p",pch=0,xlab="Number of clusters",ylab="KL",xaxt="n")
axis(1, c(min_nc:max_nc))

Example output

Loading required package: cluster
Loading required package: MASS
Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl_init' failed, running with rgl.useNULL = TRUE 
3: .onUnload failed in unloadNamespace() for 'rgl', details:
  call: fun(...)
  error: object 'rgl_quit' not found 
[1] 3.921261
[1] "max KL for 5 clusters= 56.6172806963999"
[1] "clustering for max KL"
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  2  2 
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 
 2  2  2  2  3  3  3  3  3  3  3  3  3  3  1  3  3  3  3  4  4  4  4  4  4  4 
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 
 4  4  4  4  4  4  4  4  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5 
[1] "max KL for 5 clusters= 29.7969332416695"
[1] "clustering for max KL"
 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  2  2 
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 
 2  2  2  2  3  3  3  3  3  3  3  3  3  3  1  3  3  3  3  4  4  4  4  4  4  4 
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 
 4  4  4  4  4  4  4  4  5  5  5  5  5  5  5  5  5  5  5  5  5  5  5 

clusterSim documentation built on Jan. 8, 2021, 2:13 a.m.