index.G2: Calculates G2 internal cluster quality index

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/index.G2.r

Description

Calculates G2 internal cluster quality index - Baker & Hubert adaptation of Goodman & Kruskal's Gamma statistic

Usage

1
index.G2(d,cl)

Arguments

d

'dist' object

cl

A vector of integers indicating the cluster to which each object is allocated

Details

See file \$R\_HOME\library\clusterSim\pdf\indexG2\_details.pdf for further details

Value

calculated G2 index

Author(s)

Marek Walesiak marek.walesiak@ue.wroc.pl, Andrzej Dudek andrzej.dudek@ue.wroc.pl

Department of Econometrics and Computer Science, University of Economics, Wroclaw, Poland http://keii.ue.wroc.pl/clusterSim/

References

Everitt, B.S., Landau, E., Leese, M. (2001), Cluster analysis, Arnold, London, p. 104. ISBN 9780340761199.

Gatnar, E., Walesiak, M. (Eds.) (2004), Metody statystycznej analizy wielowymiarowej w badaniach marketingowych [Multivariate statistical analysis methods in marketing research], Wydawnictwo AE, Wroclaw, p. 339. Available at: http://keii.ue.wroc.pl/pracownicy/mw/2004_Gatnar_Walesiak_Metody_SAW_w_badaniach_marketingowych.pdf.

Gordon, A.D. (1999), Classification, Chapman & Hall/CRC, London, p. 62. ISBN 9781584880134.

Hubert, L. (1974), Approximate evaluation technique for the single-link and complete-link hierarchical clustering procedures, "Journal of the American Statistical Association", vol. 69, no. 347, 698-704. Available at: doi: 10.1080/01621459.1974.10480191.

Milligan, G.W., Cooper, M.C. (1985), An examination of procedures of determining the number of cluster in a data set, "Psychometrika", vol. 50, no. 2, 159-179. Available at: doi: 10.1007/BF02294245.

See Also

index.G1, index.G3, index.S, index.H, index.KL, index.Gap, index.C, index.DB

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Example 1
library(clusterSim)
data(data_ratio)
d <- dist.GDM(data_ratio)
c <- pam(d, 5, diss = TRUE)
icq <- index.G2(d,c$clustering)
print(icq)

# Example 2
library(clusterSim)
data(data_ordinal)
d <- dist.GDM(data_ordinal, method="GDM2")
# nc - number_of_clusters
min_nc=2
max_nc=6
res <- array(0,c(max_nc-min_nc+1, 2))
res[,1] <- min_nc:max_nc
clusters <- NULL
for (nc in min_nc:max_nc)
{
cl2 <- pam(d, nc, diss=TRUE)
res[nc-min_nc+1,2] <- G2 <- index.G2(d,cl2$cluster)
clusters <- rbind(clusters,cl2$cluster)
}
print(paste("max G2 for",(min_nc:max_nc)[which.max(res[,2])],"clusters=",max(res[,2])))
print("clustering for max G2")
print(clusters[which.max(res[,2]),])
#write.table(res,file="G2_res.csv",sep=";",dec=",",row.names=TRUE,col.names=FALSE)
plot(res, type="p", pch=0, xlab="Number of clusters", ylab="G2", xaxt="n")
axis(1, c(min_nc:max_nc))

Example output

Loading required package: cluster
Loading required package: MASS
Warning messages:
1: In rgl.init(initValue, onlyNULL) : RGL: unable to open X11 display
2: 'rgl_init' failed, running with rgl.useNULL = TRUE 
3: .onUnload failed in unloadNamespace() for 'rgl', details:
  call: fun(...)
  error: object 'rgl_quit' not found 
[1] 0.8634447
[1] "max G2 for 5 clusters= 0.734682612695492"
[1] "clustering for max G2"
 [1] 1 2 3 2 4 1 2 4 4 2 4 4 1 5 5 3 4 5 3 5 4 2 4 3 3 3

clusterSim documentation built on Jan. 8, 2021, 2:13 a.m.