clustConfigurations: Fit of Cluster Configurations for Networks

Description Usage Arguments Details Value Author(s) Examples

View source: R/clustering.R

Description

Evaluates clustering solutions for n = 1, n = 2, ..., n = n clusters, by comparing the clustered matrix to the observed correlation matrix. Returns a correlation vector and a plot. Designed for networks.

Usage

1
clustConfigurations(vertices, hclustresult, observedcorrelation)

Arguments

vertices

scalar value indicating the number of vertices

hclustresult

hclust result matrix object (or similar object that works with the cutree() function)

observedcorrelation

the observed correlation matrix

Details

This function helps the user discern the number of clusters that best describe the underlying data. It loops through all of possible clusters (1 through n, where n is the number of actors in the network). For each solution corresponding to a given number of clusters, it uses the cutree() to assign the vertices (or columns) to their respective clusters corresponding to that solution.

From this, the function generates a matrix of within- and between- cluster correlations. When there is one cluster for each vertex in the network, the cell values will be identical to the observed correlation matrix. When there is one cluster for the whole network, the values will all be equal to the average correlation across the observed matrix.

From a visual inspection of the correlation matrix, the user can decide on the proper number of clusters in this network.

Value

clustConfigurations$correlations

a vector of length n showing correlation between clustered and observed correlation matrix

Author(s)

Mike Nowak michael.nowak@gmail.com

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
	# Generate socmatrix
	socmatrix = matrix(c(1,1,0,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,1,0), nrow = 5, ncol = 5)
	socmatrix 
	socmatrix_cors <- cor(socmatrix)
	socmatrix_cors
	
	# To use correlation values in hierarchical clustering, they must 
	# first be coerced into a "dissimilarity structure" using dist().
	# We subtract the values from 1 so that they are all greater than 
	# or equal to 0; thus, highly dissimilar (i.e., negatively 
	# correlated) actors have higher values.
	dissimilarity <- 1 - socmatrix_cors
	socmatrix_dist <- as.dist(dissimilarity)
	socmatrix_dist
	
	# hclust() performs a hierarchical agglomerative clustering 
	# operation based on the values in the dissimilarity matrix 
	# yielded by as.dist() above. The standard visualization is a 
	# dendrogram. 
	socmatrix_hclust <- hclust(socmatrix_dist)
	plot(socmatrix_hclust)
	
	# cutree() allows us to use the output of hclust() to set
	# different numbers of clusters and assign vertices to clusters
	# as appropriate. For example:
	cutree(socmatrix_hclust, k=2)
	
	# Now we'll try to figure out the number of clusters that best 
	# describes the underlying data. To do this, we'll loop through
	# all of the possible numbers of clusters (1 through n, where n is
	# the number of actors in the network). For each solution
	# corresponding to a given number of clusters, we'll use cutree()
	# to assign the vertices to their respective clusters 
	# corresponding to that solution.
	#
	# From this, we can generate a matrix of within- and between-
	# cluster correlations. Thus, when there is one cluster for each 
	# vertex in the network, the cell values will be identical to the
	# observed correlation matrix, and when there is one cluster for 
	# the whole network, the values will all be equal to the average
	# correlation across the observed matrix.
	#
	# We can then correlate each by-cluster matrix with the observed
	# correlation matrix to see how well the by-cluster matrix fits
	# the data. We'll store the correlation for each number of
	# clusters in a vector, which we can then plot.
	
	# First, find n:
	num_vertices = ncol(socmatrix)
	
	# Next, use the clustConfigurations function:
	clustered_observed_cors <-clustConfigurations(num_vertices,socmatrix_hclust,socmatrix_cors)
	
	# Choose n where the line starts to flatten beyond 45 degrees. 
	# Three looks like a good number for this example.
	
	num_clusters = 3
	
	clusters <- cutree(socmatrix_hclust, k = num_clusters)
	clusters
	
	( cluster_cor_mat <- clusterCorr(socmatrix_cors, clusters) )

Example output

Loading required package: sna
Loading required package: statnet.common

Attaching package: 'statnet.common'

The following object is masked from 'package:base':

    order

Loading required package: network
network: Classes for Relational Data
Version 1.15 created on 2019-04-01.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
                    Mark S. Handcock, University of California -- Los Angeles
                    David R. Hunter, Penn State University
                    Martina Morris, University of Washington
                    Skye Bender-deMoll, University of Washington
 For citation information, type citation("network").
 Type help("network-package") to get started.

sna: Tools for Social Network Analysis
Version 2.4 created on 2016-07-23.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
 For citation information, type citation("sna").
 Type help(package="sna") to get started.

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    0    0    0
[2,]    1    1    0    0    0
[3,]    0    0    1    0    0
[4,]    0    0    0    1    1
[5,]    0    0    0    1    0
           [,1]       [,2]       [,3]       [,4]       [,5]
[1,]  1.0000000  1.0000000 -0.4082483 -0.6666667 -0.4082483
[2,]  1.0000000  1.0000000 -0.4082483 -0.6666667 -0.4082483
[3,] -0.4082483 -0.4082483  1.0000000 -0.4082483 -0.2500000
[4,] -0.6666667 -0.6666667 -0.4082483  1.0000000  0.6123724
[5,] -0.4082483 -0.4082483 -0.2500000  0.6123724  1.0000000
          1         2         3         4
2 0.0000000                              
3 1.4082483 1.4082483                    
4 1.6666667 1.6666667 1.4082483          
5 1.4082483 1.4082483 1.2500000 0.3876276
[1] 1 1 1 2 2
Warning message:
In cor(as.vector(d[g1[i], , ]), as.vector(d[g2[j], , ]), use = "complete.obs") :
  the standard deviation is zero
[1] 1 1 2 3 3
           [,1]       [,2]       [,3]       [,4]       [,5]
[1,]  1.0000000  1.0000000 -0.4082483 -0.5374575 -0.5374575
[2,]  1.0000000  1.0000000 -0.4082483 -0.5374575 -0.5374575
[3,] -0.4082483 -0.4082483  1.0000000 -0.3291241 -0.3291241
[4,] -0.5374575 -0.5374575 -0.3291241  0.8061862  0.8061862
[5,] -0.5374575 -0.5374575 -0.3291241  0.8061862  0.8061862

NetCluster documentation built on May 2, 2019, 11:27 a.m.