In lwa19/centrality: Compute network centrality without generating a graph object

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This vignette demonstrates how to use the closeness centrality calculation in the centrality package on various types of graphs. The simulated graph describes the connectivity of 20 nodes, described by an adjacency matrix $A_{n\times n}$ ($n=20$).

We evaluate the influence of each node in the netowrk through computations of its closeness centrality measure, and visualize it by plotting the network overlapped with its closeness centrality score. We repeat the same process on different types matrices to demonstrate its different features and power at describing the network features.

Note that closeness centrality is only applicable to connected graphs, therefore we will not be exploring its application on sparse matrices. Instead, our focus will be on the impact of directedness and edge weights on node centrality.

library(centrality)
library('RColorBrewer')
library(igraph)
library(gplots)
library(knitr)
library(fields)
set.seed(1)

Closeness Centrality:

We will run through a simple example here to demonstrate the closeness centrality calculation:

# simulate matrix
n = 20
set.seed(1)
A.mat = sim_deg_conn(n, 0.9)
B.mat = sim_deg_conn(n, 0.1)

# convert to graph objects
A.graph = graph_from_adjacency_matrix(A.mat, mode = "undirected")
B.graph = graph_from_adjacency_matrix(B.mat, mode = "undirected")

# plot adjacency matrix
par(cex.main=0.8, par(mar = c(1, 2, 2, 6)))
colors = rep(brewer.pal(9, "Blues"), each = 3)[1:17]
plot.A.mat = A.mat + diag(rep(2, n))    # emphasize diagonal in plot
plot.B.mat = B.mat + diag(rep(2, n))
adj.mat = heatmap.2(plot.A.mat, dendrogram='none', Rowv=FALSE, Colv=FALSE, trace='none', 
          key = F, col = as.vector(colors), main = "High Connectedness", 
          colsep = 1:n, rowsep = 1:n, sepcolor = "grey", sepwidth=c(0.005,0.005))
adj.mat = heatmap.2(plot.B.mat, dendrogram='none', Rowv=FALSE, Colv=FALSE, trace='none', 
          key = F, col = as.vector(colors), main = "Low Connectedness", 
          colsep = 1:n, rowsep = 1:n, sepcolor = "grey", sepwidth=c(0.005,0.005))
image.plot(legend.only=T, zlim=range(0,1), col=colors, legend.mar = 3, 
           legend.shrink = 0.5, legend.width = 0.5, legend.cex = 0.3)

par(mfrow=c(1,2), mar = c(2, 2, 2, 6))

# calculate closeness centrality:
c.clo.A = centrality::closeness(A.mat)
c.clo.A = c.clo.A/max(c.clo.A)
c.clo.B = centrality::closeness(B.mat) 
c.clo.B = c.clo.B/max(c.clo.B)

# plot closeness centrality
graph_centrality(A.graph, c.clo.A, main = "highly connected graph", legend = T)
graph_centrality(B.graph, c.clo.B, main = "not highly connected graph", legend = T)

Performance Comparison

Here we compare the performance of package centrality to the centrality calculations in package igraph.

Accuracy:

We will test for the accuracy of the centrality package by visualizing the results using igraph output vs. the results using centrality output. The analysis will be performed on the two matrices simulated in the section immediately above.

# calculate closeness centrality using igraph:
igraph.A = centr_clo(A.graph, normalized = T)
ci.clo.A = igraph.A$res / max(igraph.A$res)
igraph.B = centr_clo(B.graph, normalized = T)
ci.clo.B = igraph.B$res / max(igraph.B$res)

# plot: 'centrality' vs 'igraph' for highly connected network: 
par(mfrow=c(1,2), mar = c(2, 2, 2, 5.1))
graph_centrality(A.graph, ci.clo.A, 
                 main = "Closeness Centrality (igraph)", legend = T)
graph_centrality(A.graph, c.clo.A/max(c.clo.A),
                 main = "Closeness Centrality (centrality)", legend = T)

# plot: 'centrality' vs 'igraph' for low connectivity network: 
par(mfrow=c(1,2), mar = c(2, 2, 2, 5.1))
graph_centrality(B.graph, ci.clo.B, 
                 main = "Closeness Centrality (igraph)", legend = T)
graph_centrality(B.graph, c.clo.B/max(c.clo.B), 
                 main = "Closeness Centrality (centrality)", legend = T)

Speed

We will time the speed of overall speed of computing the closeness centrality scores of igraph and centrality. Since the aim of the package is to avoid the tedious graph conversion, we will be including igraph's graph conversion into its total run time. We will also demonstrate their run time differences on large adjacency matrices ($n = 500$) to see a larger difference.

dense matrix

n = 500
mat.A = sim_deg_conn(n, 0.9)
print("igraph run time: ")
system.time({
  graph.A = graph_from_adjacency_matrix(mat.A, mode = "undirected")
  igraph.clo = centr_clo(graph.A, normalized = T)
  ci.clo = igraph.clo$res / max(igraph.clo$res)
})
print("centrality run time:" )
system.time({
  c.clo = centrality::closeness(mat.A)
  c.clo = c.clo/max(c.clo)
})

sparse matrix:

mat.A = sim_deg_conn(n, 0.1)
print("igraph run time: ")
system.time({
  graph.A = graph_from_adjacency_matrix(mat.A, mode = "undirected")
  igraph.clo = centr_clo(graph.A, normalized = T)
  ci.clo = igraph.clo$res / max(igraph.clo$res)
})
print("centrality run time:" )
system.time({
  c.clo = centrality::closeness(mat.A)
  c.clo = c.clo/max(c.clo)
})

We can see that even with a not-so-large matrix ($n = 500$ as opposed to $n = 1,000$ example we used in the degree centrality computation), we already see a large increase run-time for the centrality package function. This is very likely due to the Dijkstra's algorithm that is an inherent part of the closeness centrality computation, that needs to iterate through every two node pairs to find the geodesics between all nodes in the network. To my knowledge, igraph is implemented in C++ and has been optimized and established across many platforms. So I will admit defeat on this one.