In neurodata/graphstats: Graph Statistics

require(graphstats)
require(mclust)
require(ggplot2)

Here, we present the ase function, used for representing graphs in lower-dimensions.

Stochastic Blockmodel via Random Dot-Product Graph

We will apply ASE to a SBM constructed from an RDPG. We start with latent vectors [0.85, 0] and [0.3, 0.8]. These make edge probability within a block approximately 0.75, and between blocks approximately 0.25. The Adjacency Matrix is plotted below.

# Create latent vectors, and edge probability matrix.
block1 <- as.matrix(c(0.85, 0), nrow = 2)
block2 <- as.matrix(c(0.3, 0.8), nrow = 2)
block_probs <- matrix(c(t(block1) %*% block1,
                        t(block1) %*% block2,
                        t(block2) %*% block1,
                        t(block2) %*% block2), ncol = 2)

# Create SBM. Use higher edge probability if nodes are from same block.
set.seed(456)
n <- 60
block.sizes <- c(n/2, n/2)
blocks <- rep(c(1, 2), block.sizes)
g <- igraph::sample_sbm(n, block_probs, block.sizes)
gs.plot.heatmap(g)

Embedding the Graph

Now, we call the function to embed the adjacency matrix in R^2. Ideally, we observe two clear clusters in the plotted data. Returned is an nx2 matrix.

# Embed graph into R^2.
dim <- 2
X <- gs.embed.ase(g, dim)$X
dat <- as.data.frame(X)

# Display.
p <- ggplot(dat) +
  geom_point(aes(x = V1, y = V2), color=blocks) +
  xlab("PC Score 1") +
  ylab("PC Score 2")
print(p)

Estimating the Block Assignments

One application of ase is graph clustering. If we cluster the embedded data as if it were a mixture of Gaussians, the cluster means consistent estimate the latent vectors (down to a rotation).

# Cluster using EM algorithm.
model <- Mclust(X, verbose = FALSE)
predictions <- round(model$z[,2])

# Find cluster means. Rotate the data to be around latent vectors via Procrustes.
means <- model$parameters$mean
latent_vecs <- cbind(block1, block2)
M <- svd(means %*% t(latent_vecs))
R <- M$u %*% t(M$v)
X_R <- X %*% R

# Display.
dat <- as.data.frame(X_R)
vec <- as.data.frame(t(latent_vecs))
p <- ggplot(dat) +
  geom_point(aes(x = V1, y = V2), color=blocks) +
  geom_point(data = vec, mapping = aes(x=V1, y=V2, shape=4), size = 5) + 
  scale_shape_identity() +
  xlab("PC Score 1") +
  ylab("PC Score 2")
print(p)