require(igraph)
require(graphstats)
require(mclust)
require(ggplot2)
require(gridExtra)

We present nonpar, nonparametric two-sample testing on random graphs used to test the hypothesis that two graphs are identically distributed.

Testing/Simulation

Stochastic Blockmodel Example

We illustrate the hypothesis tests through simulated examples. We simulate two graphs from Stochastic Blockmodel. For graph 1, edge probability within a block is approximately 0.23, and between blocks approximately 0.76, with block membership probabilities (0.4,0.6). Let graph 2 has the same block membership probabilities, but edge probability within a block is approximately 0.23 + epsilon, and between blocks approximately 0.76, where epsilon is 0.01.

We then test, for a given epsilon > 0, the hypothesis whether graph 1 and graph 2 are identially distributed. nonpar uses bootstrapping to conduct hypothesis testing.

# Generate two graphs from SBM
set.seed(123)
n <- 100
block.sizes <- c(n * 0.4, n * 0.6)
block_probs <- matrix(c(0.23, 0.76,
                        0.76, 0.23), nrow = 2)
epsilon = 0.01
block_probs.epsilon <- matrix(c(0.23 + epsilon, 0.76, 
                                0.76, 0.23 + epsilon), nrow = 2)
g1 <- igraph::sample_sbm(n, block_probs, block.sizes)
g2 <- igraph::sample_sbm(n, block_probs.epsilon, block.sizes)

Here, let's visualize the two graphs we simulated through their adjacency matrices.

A1 <- matrix(as_adj(g1), nrow = 100)
A2 <- matrix(as_adj(g2), nrow = 100)

p1 = gs.plot.plot_matrix(A1, legend.name = "connection", xlabel = "vertex", 
                    ylabel = "vertex", title = "Two Graphs Simulated from 
                    SBM", ffactor = TRUE)
p2 = gs.plot.plot_matrix(A2, legend.name = "connection", xlabel ="vertex", 
                         ylabel = "vertex", ffactor = TRUE)
grid.arrange(p1, p2, nrow = 1)

Now, we can embed two graphs to R^2 and compare their latent positions:

embed.graph <- function(g, dim) {
  # Call ase to get latent position
  lpv = graphstats::ase(g, dim)
  for (i in 1:dim) {
    if (sign(lpv[1, i]) != 1) {
      lpv[, i] = -lpv[, i]
    }
  }
  return(lpv)
}
Xhat = embed.graph(g1, 2)
Xhat_epsilon = embed.graph(g2, 2)
Xhat_df = as.data.frame(Xhat)
Xhat_epsilon_df = as.data.frame(Xhat_epsilon)
gg <- ggplot(Xhat_df, aes(x=V1, y=V2, color = "Graph 1")) + 
  geom_point(size=1, shape=1)
gg + geom_point(data = Xhat_epsilon_df, aes(x=V1, y=V2,color = "Graph 2"),
                size=1, shape=1) + 
  labs(title="Latent Positions of Two Graphs", x="X", y="Y") + 
  theme(plot.title = element_text(hjust = 0.5))

Now, the significance level is set to default alpha = 0.05 and the rejection regions are specified via B = 200 bootstrap permutation using the estimated latent positions.

np = nonpar(g1, g2, printResult = TRUE)
np$plot

We reject the null hypothesis if test statistic (blue dashed line) is to the right of the critical value (red line). Here we failed to reject our null since our test statistic is inside the acceptance region.



neurodata/graphstats documentation built on May 14, 2019, 5:19 p.m.