In Chabert-Liddell/MLVSBM: A Stochastic Block Model for Multilevel Networks

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(MLVSBM)

Presentation of generalized multivel networks

This vignette shows how to use the MLVSBM package to deal with generalized multilevel networks. This is done by modeling these networks with an ad-hoc Stochastic Block Model. The mathematical details can be found at http://www.theses.fr/en/2022UPASM005 (Chapter 2.E).

A diagonal acyclic graph of the model for a graph with $L$ levels is provided below.

        A^1             A^{L-1}
         |               |
         v               v  
Z^1 --> Z^2 --> ... --> Z^L
 |       |               |
 v       v               v
X^1 --> X^2 --> ... --> X^L

A temporal network with the same nodes at each time is a particular case of generalized multilevel networks with diagonal affiliation matrices.

Description of the data

Assume that we have a generalized multilevel networks with $4$ levels, each between $80$ and $100$ nodes and $3$ latent blocks. The description of the levels are the following:

An undirected network with an assortative community structure
An undirected network with a disassortive structure
An undirected network with a core-periphery structure
A directed network with an asymmetric structure.

The stability of clusters between levels is quite strong as 80% of the nodes belonging to the same cluster in a level will also be grouped together on the next level.

Simulating the data

We set the simulation parameters:

n <- 100 
L <- 4
alpha <- list(
  diag(.4, 3, 3) + .1,
  -diag(.3, 3, 3) + .4,
  matrix(c(.8, .2, .1,
        .4, .4, .1,
        .2, .1, .1), 3, 3),
  matrix(c(.3, .5, .5,
         .1, .4, .5,
         .1, .3, .1), 3, 3)
)
alpha[[1]][1,1] <- .8
alpha[[1]][3,3] <- .3
gamma <- lapply(seq(3),
                function(m) matrix(c(.8, .1, .1,
                                     .1, .8, .1,
                                     .1, .1, .8), 3, 3, byrow = TRUE))
pi <- list(c(.2, .3, .5), NULL, NULL, NULL)

directed = c(FALSE, FALSE, FALSE, TRUE)

The network can then be simulated.

set.seed(1234)
gmlv <- mlvsbm_simulate_generalized_network(
                                    n = rep(n, 4) - c(20,20,0,0), # Number of nodes
                                    Q = rep(3, 4), # Number of blocks
                                    pi = pi, # Mixture parameters
                                    gamma = gamma, # Affiliation paramters
                                    alpha = alpha, # Connectivity paramters
                                    directed = directed,
                                    distribution = rep("bernoulli", 4))

By doing this we simulated a generalized multilevel network with $4$ adjacency matrices and $3$ affiliation matrices.

str(gmlv$adjacency_matrix)
str(gmlv$affiliation_matrix)

Fitting the data

We create a generalized multilevel network from the simulated data,

my_gmlv <- mlvsbm_create_generalized_network(X = gmlv$adjacency_matrix,
                                             A = gmlv$affiliation_matrix,
                                             directed = directed,
                                             distribution = rep("bernoulli", 4))

then we can fit the model with an initialization of just one block for each level:

fit_from_scratch <-  mlvsbm_estimate_generalized_network(gmlv,
                                    init_clustering = lapply(seq(4),
                                                             function(x)rep(1, n)),
                                    init_method = "merge_split", 
                                    nb_cores = 1L,
                                    fit_options = list(ve = "joint") 
                                    )

Or by first fitting an independent SBM on each level, and precising that we need 3 blocks on each level:

fit <-  mlvsbm_estimate_generalized_network(gmlv,
                                            nb_clusters =  rep(3, 4),
                                            nb_cores = 1L,
                                            fit_options = list(ve = "sequential"))

Results and analysis

We can then plot the fitted structure:

plot(fit_from_scratch)

Or compared for each level, the inferred clustering with the simulated one with ARI (1 for perfect recovery, 0 for clustering by chance):

paste(
  "ARI for level", seq(L), ":", 
  round(sapply(seq(L),
       function(l) ARI(x = gmlv$memberships[[l]],
                       y = fit_from_scratch$Z[[l]])), 2)
)

Or compare the inferred clustering from our two methods of inference with ARI:

paste(
  "ARI for level", seq(L), ":", 
  round(sapply(seq(L),
       function(l) ARI(x = fit$Z[[l]],
                       y = fit_from_scratch$Z[[l]])), 2)
)