Simple and Bipartite Stochastic Block Models

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Preliminaries

This vignette illustrates the use of the estimateSBM function and the methods accompanying the R6 classes SimpleSBMfit and BipartiteSBMfit.

Requirements

The packages required for the analysis are sbm plus some others for data manipulation and representation:

library(sbm)
library(ggplot2)
library(knitr)
theme_set(theme_bw())

Data set: antagonistic tree/fungus interaction network

We consider the fungus-tree interaction network studied by @tree_fungus_network, available with the package sbm:

data("fungusTreeNetwork")
str(fungusTreeNetwork, max.level = 1)

This data set provides information about $154$ fungi sampled on $51$ tree species. It is a list with the following entries:

We first consider the tree-tree interactions resulting into a Simple Network. Then we consider the bipartite network between trees and fungi.

Mathematical Background

See @blockmodels for details.

Analysis of the tree/tree data

Tree-tree binary interaction networks

We first consider the binary network where an edge is drawn between two trees when they do share a least one common fungi:

tree_tree_binary <- 1 * (fungusTreeNetwork$tree_tree != 0)

The simple function plotMyMatrix can be use to represent simple or bipartite SBM:

plotMyMatrix(tree_tree_binary, dimLabels =c('tree'))

We look for some latent structure of the network by adjusting a simple SBM with the function estimateSimpleSBM.

mySimpleSBM <- tree_tree_binary %>% 
  estimateSimpleSBM("bernoulli", dimLabels = 'tree', estimOptions = list(verbosity = 1, plot = TRUE))

Once fitted, the user can manipulate the fitted model by accessing the various fields and methods enjoyed by the class simpleSBMfit. Most important fields and methods are recalled to the user via the show method:

class(mySimpleSBM)
mySimpleSBM

For instance,

mySimpleSBM$nbBlocks
mySimpleSBM$nbNodes
mySimpleSBM$nbCovariates

The plot method is available as a S3 or R6 method. The default represents the network data reordered according to the memberships estimated in the SBM.

plot(mySimpleSBM, type = "data", dimLabels  = c('tree'))

One can also plot the expected network which, in case of the Bernoulli model, corresponds to the probability of connection between any pair of nodes in the network.

plot(mySimpleSBM, type = "expected")
plot(mySimpleSBM, type = "meso")
coef(mySimpleSBM, 'block')
coef(mySimpleSBM, 'connectivity')

About model selection and choice of the number of blocks

During the estimation, a certain range of models are explored corresponding to different number of blocks. By default, the best model in terms of Integrated Classification Likelihood is sent back. In fact, all the model are stored internally. The user can have a quick glance at them via the $storedModels field:

mySimpleSBM$storedModels %>% kable()

We can then see what models are competitive in terms of model selection by checking the ICL:

mySimpleSBM$storedModels %>%  ggplot() + aes(x = nbBlocks, y = ICL)  + geom_line() + geom_point(alpha = 0.5)

The 4-block model could have been a good choice too, in place of the 5-block model. The user can update the current simpleSBMfit thanks to the the setModel method:

mySimpleSBM$setModel(4)
mySimpleSBM$nbBlocks
mySimpleSBM$plot(type = 'expected')
mySimpleSBM$setModel(5)

Analysis of the weighted interaction network

Instead of considering the binary network tree-tree we may consider the weighted network where the link between two trees is the number of fungi they share.

We plot the matrix with function plotMyMatrix:

tree_tree <- fungusTreeNetwork$tree_tree
plotMyMatrix(tree_tree, dimLabels = c('tree'))

Here again, we look for some latent structure of the network by adjusting a simple SBM with the function estimateSimpleSBM, considering a Poisson distribution on the edges.

mySimpleSBMPoisson <- tree_tree %>% 
  estimateSimpleSBM("poisson", dimLabels = 'tree', estimOptions = list(verbosity = 0, plot = FALSE))

Once fitted, the user can manipulate the fitted model by accessing the various fields and methods enjoyed by the class simpleSBMfit. Most important fields and methods are recalled to the user via the show method:

class(mySimpleSBMPoisson)
mySimpleSBMPoisson

For instance,

mySimpleSBMPoisson$nbBlocks
mySimpleSBMPoisson$nbNodes
mySimpleSBMPoisson$nbCovariates

We now plot the matrix reordered according to the memberships estimated in the SBM:

plot(mySimpleSBMPoisson, type = "data", dimLabels =c('tree'))

One can also plot the expected network which, in case of the Poisson model, corresponds to the expectation of connection between any pair of nodes in the network.

plot(mySimpleSBMPoisson, type = "expected", dimLabels = c('tree'))
plot(mySimpleSBMPoisson, type = "meso")

The same manipulations can be made on the models as before.

coef(mySimpleSBMPoisson, 'block')
coef(mySimpleSBMPoisson, 'connectivity')

Introduction of covariates

We have on each pair of trees 3 covariates, namely the genetic distance, the taxonomic distance and the geographic distance. Each covariate has to be introduced as a matrix: $X^k_{ij}$ corresponds to the value of the $k$-th covariate describing the couple $(i,j)$.

mySimpleSBMCov<- 
  tree_tree %>% 
  estimateSimpleSBM(
    model = 'poisson', 
    directed = FALSE, 
    dimLabels = 'tree',
    covariates  = fungusTreeNetwork$covar_tree, 
    estimOptions = list(verbosity = 0, plot = FALSE, nbCores = 2)
  )
mySimpleSBMCov$nbBlocks
mySimpleSBMCov$connectParam
mySimpleSBMCov$blockProp
mySimpleSBMCov$memberships
mySimpleSBMCov$covarParam
coef(mySimpleSBMCov, 'covariates')

S3 methods are also available for fit and prediction (results hidden here)

#fitted(mySimpleSBMCov)
#predict(mySimpleSBMCov)
#predict(mySimpleSBMCov, fungusTreeNetwork$covar_tree)

Analysis of the tree/fungi data

We now analyze the bipartite tree/fungi interactions. The incidence matrix can be plotted with the function \cote{plotMyMatrix}

plotMyMatrix(fungusTreeNetwork$fungus_tree, dimLabels = c(row = 'fungis', col= 'tree'))
myBipartiteSBM <- 
  fungusTreeNetwork$fungus_tree %>% 
  estimateBipartiteSBM(model = 'bernoulli', dimLabels = c('fungis', 'tree'),estimOptions = list(verbosity = 0, plot = FALSE))
myBipartiteSBM$nbNodes
myBipartiteSBM$nbBlocks
myBipartiteSBM$connectParam
coef(myBipartiteSBM, 'block')
coef(myBipartiteSBM, 'connectivity')

We can now plot the reorganized matrix.

plot(myBipartiteSBM, dimLabels = c('fungis', 'tree'))
plot(myBipartiteSBM, type  = 'meso', 
     dimLabels = list(row = 'fungis', col= 'tree'),
     plotOptions = list(edge.width = 1, vertex.size = c(1,2)))

References



Try the sbm package in your browser

Any scripts or data that you put into this service are public.

sbm documentation built on Jan. 9, 2023, 5:12 p.m.