knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In this vignette we show how to set up a model to perform inference for the generalised Stochastic Block Model (GSBM) using the reversible jump split-merge sampling method.
To start, load the library
library(SBMSplitMerge)
In this toy example we consider a set of 50 interconnected electrical components. Each component is connected to the other and the connections are all turned on at once. We record the failure time of each connections.
This can be modelled as a network, where the edge-weights are the failure times. A standard likelihood model for failure times is the Weibull distribution (on Wikipedia). The probability density function is: [ \frac{k}{\lambda}\left(\frac{x}{\lambda}\right)^{k-1} e^{-(x/\lambda)^{k}} ]
We posit that the components exhibit some group behaviour via the failure times, such that connections between groups tend to fail more quickly than connections within a group; however we don't know the number of groups. The GSBM is a good model to consider for such data since it can provide a posterior distribution for the number of groups and the Weibull with both parameters unknown has no conjugate prior.
There are three parts to the GSBM: - An edge model - A block model - A parameter model (for the parameters of the edge model)
The edgemod
object in SBMSplitMerge encapsulates the likelihood of the edge weights.
A basic edgemod
is a thin wrapper to a density function. The package insists on the density function taking the form f(x, p)
where x
is an edge-weight and p
is a parameter 3-array.
An optional random method can be added. This has the signature f(p)
where p
is again a vector of parameters.
For the Weibull example, we can rely on \code{R}'s built in Weibull functions
edge_model <- edgemod( function(e, p) dweibull(e, p[1,,], p[2,,], log=TRUE) , function(p) rweibull(1, p[1], p[2]) )
In the GSBM, the prior specification for the number of blocks is explicit. To facilitate other models (such as a Chinese Restaurant Process), the \code{blockmod} object in the package
encapsulates the join distribution for the number of blocks and assignment of nodes to blocks; we refer to these as the 'block structure'.
To provide a \code{blockmod} the following are needed: some input parameters gamma, log density for the block structure, a random method for the block structure, the conditional distribution of a single node assignment given the others and a boolean indicating if the number of blocks (kappa) is fixed.
The package provides three pre-specified \code{blockmod}s:
- multinom
- fixed number of blocks with a multinomial assignment of nodes to blocks
- crp
- the Chinese restaurant process
- dma
- Dirichlet-Multinomial allocation
We will use dma
, the first parameter (gamma) is the concentration parameter for a symmetric Dirichlet distribution and the second (delta) is the rate parameter for a Poisson.
The 'number of blocks'-1 is modelled as a Poisson(delta) and the block assignments as Dirichlet-Multinomial(gamma, K). [cf. the paper]
block_model <- dma(1, 10)
Finally, the prior model on the parameters of the Weibull distribution is specified.
This is the most involved component since it requires some functions to map the inputs to the real line.
The components of a parammod
are:
- \code{logd} - a function(p)
where p
is a params
object gets the probability density of a params
object,
- \code{r} - a function(kappa)
simulates a params
object for kappa
blocks,
- \code{t} - a function mapping the support of the parameter to the real line,
- \code{invt} - a function mapping the real line to the support of the parameter,
- \code{loggradt} - the jacobian of t.
For the Weibull example, $k$ and $\lambda$ must be positive. A reasonable prior for each is is a Gamma distribution, which can be specified like so:
param_model <- parammod( function(params){ dgamma(params$theta0[1], 1, 1, log=TRUE) + dgamma(params$theta0[2], 1, 1, log=TRUE) + sum(dgamma(params$thetak[,1], 1, 1, log=TRUE)) + sum(dgamma(params$thetak[,2], 1, 1, log=TRUE)) }, function(kappa){ params( c(rgamma(1, 1, 1), rgamma(1, 1, 1)) , cbind(rgamma(kappa, 1, 1), rgamma(kappa, 1, 1)) ) }, function(x){ cbind(log(x[1]), log(x[2]))}, function(x){ cbind(exp(x[1]), exp(x[2]))}, function(x){ -log(x[1])-log(x[2])} )
this prior says each $k$ and $\lambda$ are Gamma(1,1) distributed.
We wrap all the model components up into a sbmmod
object:
model <- sbmmod(block_model, param_model, edge_model)
The data should appear in a square matrix and wrapped in an edges
object.
For the purpose of this example we simulate the data with some true values:
set.seed(1) true_blocks <- blocks(rep(c(1, 2, 3), c(10, 20, 20))) true_params <- params(c(1, 0.5), cbind(1, c(3,4,5))) true_sbm <- sbm(true_blocks, true_params) weibull_edges <- redges(true_sbm, model$edge)
The blocks
object describes the block assignments of nodes to blocks:
print(true_blocks) plot(true_blocks) plot(weibull_edges)
We are now in a position to sample from the model:
weibull_posterior <- sampler(weibull_edges, model, 300, "rj", sigma=0.1)
and see some diagnostic plots
weibull_plots <- eval_plots(weibull_posterior) weibull_plots
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.