clusterCorr: Cluster correlation matrix for networks

Description Usage Arguments Value Author(s) Examples

View source: R/clustering.R

Description

clusterCorr by-cluster correlation matrix

Usage

1
clusterCorr(observed_cor_matrix, cluster_vector)

Arguments

observed_cor_matrix

observed correlation matrix

cluster_vector

vector of cluster membership

Value

clusterCorr

a by-cluster correlation matrix

Author(s)

Mike Nowak michael.nowak@gmail.com

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
	# Generate socmatrix
	socmatrix = matrix(c(1,1,0,0,0,1,1,0,0,0,0,0,1,0,0,0,0,0,1,1,0,0,0,1,0), nrow = 5, ncol = 5)
	socmatrix 
	socmatrix_cors <- cor(socmatrix)
	socmatrix_cors
	
	# To use correlation values in hierarchical clustering, they must 
	# first be coerced into a "dissimilarity structure" using dist().
	# We subtract the values from 1 so that they are all greater than 
	# or equal to 0; thus, highly dissimilar (i.e., negatively 
	# correlated) actors have higher values.
	dissimilarity <- 1 - socmatrix_cors
	socmatrix_dist <- as.dist(dissimilarity)
	socmatrix_dist
	
	# hclust() performs a hierarchical agglomerative clustering 
	# operation based on the values in the dissimilarity matrix 
	# yielded by as.dist() above. The standard visualization is a 
	# dendrogram. 
	socmatrix_hclust <- hclust(socmatrix_dist)
	plot(socmatrix_hclust)
	
	# cutree() allows us to use the output of hclust() to set
	# different numbers of clusters and assign vertices to clusters
	# as appropriate. For example:
	cutree(socmatrix_hclust, k=2)
	
	# Now we'll try to figure out the number of clusters that best 
	# describes the underlying data. To do this, we'll loop through
	# all of the possible numbers of clusters (1 through n, where n is
	# the number of actors in the network). For each solution
	# corresponding to a given number of clusters, we'll use cutree()
	# to assign the vertices to their respective clusters 
	# corresponding to that solution.
	#
	# From this, we can generate a matrix of within- and between-
	# cluster correlations. Thus, when there is one cluster for each 
	# vertex in the network, the cell values will be identical to the
	# observed correlation matrix, and when there is one cluster for 
	# the whole network, the values will all be equal to the average
	# correlation across the observed matrix.
	#
	# We can then correlate each by-cluster matrix with the observed
	# correlation matrix to see how well the by-cluster matrix fits
	# the data. We'll store the correlation for each number of
	# clusters in a vector, which we can then plot.
	
	# First, find n:
	num_vertices = ncol(socmatrix)
	
	# Next, use the clustConfigurations function:
	clustered_observed_cors <-clustConfigurations(num_vertices,socmatrix_hclust,socmatrix_cors)
	
	# Choose n where the line starts to flatten beyond 45 degrees. 
	# Three looks like a good number for this example.
	
	num_clusters = 3
	
	clusters <- cutree(socmatrix_hclust, k = num_clusters)
	clusters
	
	( cluster_cor_mat <- clusterCorr(socmatrix_cors, clusters) )

Example output

Loading required package: sna
Loading required package: statnet.common

Attaching package: 'statnet.common'

The following object is masked from 'package:base':

    order

Loading required package: network
network: Classes for Relational Data
Version 1.15 created on 2019-04-01.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
                    Mark S. Handcock, University of California -- Los Angeles
                    David R. Hunter, Penn State University
                    Martina Morris, University of Washington
                    Skye Bender-deMoll, University of Washington
 For citation information, type citation("network").
 Type help("network-package") to get started.

sna: Tools for Social Network Analysis
Version 2.4 created on 2016-07-23.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
 For citation information, type citation("sna").
 Type help(package="sna") to get started.

     [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    0    0    0
[2,]    1    1    0    0    0
[3,]    0    0    1    0    0
[4,]    0    0    0    1    1
[5,]    0    0    0    1    0
           [,1]       [,2]       [,3]       [,4]       [,5]
[1,]  1.0000000  1.0000000 -0.4082483 -0.6666667 -0.4082483
[2,]  1.0000000  1.0000000 -0.4082483 -0.6666667 -0.4082483
[3,] -0.4082483 -0.4082483  1.0000000 -0.4082483 -0.2500000
[4,] -0.6666667 -0.6666667 -0.4082483  1.0000000  0.6123724
[5,] -0.4082483 -0.4082483 -0.2500000  0.6123724  1.0000000
          1         2         3         4
2 0.0000000                              
3 1.4082483 1.4082483                    
4 1.6666667 1.6666667 1.4082483          
5 1.4082483 1.4082483 1.2500000 0.3876276
[1] 1 1 1 2 2
Warning message:
In cor(as.vector(d[g1[i], , ]), as.vector(d[g2[j], , ]), use = "complete.obs") :
  the standard deviation is zero
[1] 1 1 2 3 3
           [,1]       [,2]       [,3]       [,4]       [,5]
[1,]  1.0000000  1.0000000 -0.4082483 -0.5374575 -0.5374575
[2,]  1.0000000  1.0000000 -0.4082483 -0.5374575 -0.5374575
[3,] -0.4082483 -0.4082483  1.0000000 -0.3291241 -0.3291241
[4,] -0.5374575 -0.5374575 -0.3291241  0.8061862  0.8061862
[5,] -0.5374575 -0.5374575 -0.3291241  0.8061862  0.8061862

NetCluster documentation built on May 2, 2019, 11:27 a.m.