In neurodata/graphstats: Graph Statistics

require(graphstats)
require(igraph)
require(ggplot2)

Background

Consider two networks on overlapping, non-identical vertex sets. Given vertices of interest in the first network, we seek to identify the corresponding vertices, if any exist, in the second network. Our methodology identifies vertices in a local neighborhood of the vertices of interest in the first network that have verifiable corresponding vertices in the second network.

Leveraging these known correspondences, referred to as seeds, we match the induced subgraphs in each network generated by the neighborhoods of these verified seeds, and rank the vertices of the second network in terms of the most likely matches to the original vertices of interest.

Simulation on Isomorphic Graphs

Here, we produce a simple random graph G1 with n = 10 and p = 0.5. We then produce G2 by maintaining the first m = 3 rows (the seeds), and permuting the remaining n - m vertices. We delete the last vertex of G2 just make two graphs different sizes.

# Number of seeds, total nubmer of vertices, and edge probability.
m <- 3
n <- 10
p <- 0.5
set.seed(1234)

# Sample graph 1, and permute to graph 2; delete a vertex to show VN works
# when total number of vertices are different.
g1 = sample_sbm(n, as.matrix(p), n)
permute = c(1:m, sample(n-m)+m)
g2 = permute.vertices(g1, permute)
g2 = delete.vertices(g2, 10)

seeds <- matrix(cbind(1:m, 1:m), nrow = m)
x = 4 # VOI
h = 1
ell = 2
R = 100
gamma = 0.01
vn = vnsgm(x,seeds,g1,g2,h,ell,R,gamma,plotF = TRUE)

Here green represents the set of seeds; red represents VOIs; blue represents unobserved vertices we matched via SGM. Shade of color represents how likely vertex i in graph 1 is a match to vertex j in graph 2. In this case, we do not see any shade change, therefore we only have one prediction of match for each vertex.

Let's compare out prediction of matches to our original permutation.

# If no match is found for a given vertex, the index of the nominated vertex is set to zero
P = vn$P
col = colnames(P)
P$"0" = 0
P = P[c("0",col)]
permute[permute == 10] = 0 # The 10th vertex in graph 2 is deleted
nominated.vertex = as.numeric(colnames(P)[apply(P,1,which.max)])
df = as.data.frame(rbind(permute,nominated.vertex))
row.names(df) = c('True Permutation of G1','VN Predicted Permutation')
df

Simulation on r-Correlated ER Graphs

Here, we sample (G1, G2) from an r-Correlated ER distribution via a random dot-product graph, permute the matrix like before, and observe the results of Vertex Nomination.

# Define latent vectors for 2-SBM RDPG in X.
n <- 10
X1 <- matrix(rep(c(0.85, 0), n/2), nrow = n/2, byrow = TRUE)
X2 <- matrix(rep(c(0.3,0.8), n/2), nrow = n/2, byrow = TRUE)
X <- rbind(X1, X2)
set.seed(6789)

# Pearson correlation coefficient.
r <- 0.5

# Sample r-SBM.
sampled_graphs <- rdpg.sample.correlated(X, r)
g1 = sampled_graphs[[1]]
g2 = sampled_graphs[[2]]
A <- as_adj(g1, sparse = FALSE)
B <- as_adj(g2, sparse = FALSE)
# Display overlap.
gs.plot.plot_matrix(A + B, title="A + B (Overlap)", legend.name="A_ij + B_ij")

These graphs are not exactly equal. Their matrix addition, pictured above, has many 2 and 0 entries, and some 1 entries. Here, we choose the first two vertices in each graph as our seeding.

# Identify seeds.
m <- 2
seeds <- matrix(cbind(1:m, 1:m), nrow = m)
set.seed(364)
# Set up permutation.
permute = c(1:m, sample(n-m)+m)
# Apply permutation and delete 1 vertex from graph 2 two make vertex set size different for two graphs.
g2_p = permute.vertices(g2, permute)
g2_p = delete.vertices(g2_p, 10)
vn = vnsgm(x,seeds,g1,g2_p,h,ell,R,gamma,plotF = TRUE)

Here we display our original permutation for comparison purposes.

nominated.vertex = as.numeric(colnames(vn$P)[apply(vn$P,1,which.max)])
df = as.data.frame(rbind(permute,nominated.vertex))
row.names(df) = c('True Permutation of G1','VN Permutation')
df