overlapping_sbm: Create an undirected overlapping degree corrected stochastic...
In fastRG: Sample Generalized Random Dot Product Graphs in Linear Time

View source: R/undirected_overlapping_sbm.R

overlapping_sbm

R Documentation

Create an undirected overlapping degree corrected stochastic blockmodel object

Description

To specify a overlapping stochastic blockmodel, you must specify the number of nodes (via n), the mixing matrix (via k or B), and the block probabilities (optional, via pi). We provide defaults for most of these options to enable rapid exploration, or you can invest the effort for more control over the model parameters. We strongly recommend setting the expected_degree or expected_density argument to avoid large memory allocations associated with sampling large, dense graphs.

Usage

overlapping_sbm(
  n,
  k = NULL,
  B = NULL,
  ...,
  pi = rep(1/k, k),
  sort_nodes = TRUE,
  force_pure = TRUE,
  poisson_edges = TRUE,
  allow_self_loops = TRUE
)

Arguments

`n`	The number of nodes in the overlapping SBM.
`k`	(mixing matrix) The number of blocks in the blockmodel. Use when you don't want to specify the mixing-matrix by hand. When `k` is specified, `B` is set to a diagonal dominant matrix with value `0.8` along the diagonal and `0.1 / (k - 1)` on the off-diagonal. `k` defaults to `NULL`. You must specify either `k` or `B`, but not both.
`B`	(mixing matrix) A `k` by `k` matrix of block connection probabilities. The probability that a node in block `i` connects to a node in community `j` is `Poisson(B[i, j])`. Must be an invertible, symmetric square matrix. `matrix` and `Matrix` objects are both acceptable. If `B` is not symmetric, it will be symmetrized via the update `B := B + t(B)`. Defaults to `NULL`. You must specify either `k` or `B`, but not both.
`...`	Arguments passed on to `undirected_factor_model` `expected_degree` If specified, the desired expected degree of the graph. Specifying `expected_degree` simply rescales `S` to achieve this. Defaults to `NULL`. Do not specify both `expected_degree` and `expected_density` at the same time. `expected_density` If specified, the desired expected density of the graph. Specifying `expected_density` simply rescales `S` to achieve this. Defaults to `NULL`. Do not specify both `expected_degree` and `expected_density` at the same time.
`pi`	(block probabilities) Probability of membership in each block. Membership in each block is independent under the overlapping SBM. Defaults to `rep(1 / k, k)`.
`sort_nodes`	Logical indicating whether or not to sort the nodes so that they are grouped by block. Useful for plotting. Defaults to `TRUE`.
`force_pure`	Logical indicating whether or not to force presence of "pure nodes" (nodes that belong only to a single community) for the sake of identifiability. To include pure nodes, block membership sampling first proceeds as per usual. Then, after it is complete, `k` nodes are chosen randomly as pure nodes, one for each block. Defaults to `TRUE`.
`poisson_edges`	Logical indicating whether or not multiple edges are allowed to form between a pair of nodes. Defaults to `TRUE`. When `FALSE`, sampling proceeds as usual, and duplicate edges are removed afterwards. Further, when `FALSE`, we assume that `S` specifies a desired between-factor connection probability, and back-transform this `S` to the appropriate Poisson intensity parameter to approximate Bernoulli factor connection probabilities. See Section 2.3 of Rohe et al. (2017) for some additional details.
`allow_self_loops`	Logical indicating whether or not nodes should be allowed to form edges with themselves. Defaults to `TRUE`. When `FALSE`, sampling proceeds allowing self-loops, and these are then removed after the fact.

Value

An undirected_overlapping_sbm S3 object, a subclass of the undirected_factor_model() with the following additional fields:

pi: Sampling probabilities for each block.
sorted: Logical indicating where nodes are arranged by block (and additionally by degree heterogeneity parameter) within each block.

Generative Model

There are two levels of randomness in a degree-corrected overlapping stochastic blockmodel. First, for each node, we independently determine if that node is a member of each block. This is handled by overlapping_sbm(). Then, given these block memberships, we randomly sample edges between nodes. This second operation is handled by sample_edgelist(), sample_sparse(), sample_igraph() and sample_tidygraph(), depending depending on your desired graph representation.

Identifiability

In order to be identifiable, an overlapping SBM must satisfy two conditions:

B must be invertible, and
the must be at least one "pure node" in each block that belongs to no other blocks.

Block memberships

Note that some nodes may not belong to any blocks.

TODO

Edge formulation

Once we know the block memberships, we need one more ingredient, which is the baseline intensity of connections between nodes in block i and block j. Then each edge A_{i,j} is Poisson distributed with parameter

TODO

References

Kaufmann, Emilie, Thomas Bonald, and Marc Lelarge. "A Spectral Algorithm with Additive Clustering for the Recovery of Overlapping Communities in Networks," Vol. 9925. Lecture Notes in Computer Science. Cham: Springer International Publishing, 2016. https://doi.org/10.1007/978-3-319-46379-7.

Latouche, Pierre, Etienne Birmelé, and Christophe Ambroise. "Overlapping Stochastic Block Models with Application to the French Political Blogosphere." The Annals of Applied Statistics 5, no. 1 (March 2011): 309–36. https://doi.org/10.1214/10-AOAS382.

Zhang, Yuan, Elizaveta Levina, and Ji Zhu. "Detecting Overlapping Communities in Networks Using Spectral Methods." ArXiv:1412.3432, December 10, 2014. http://arxiv.org/abs/1412.3432.

Examples


set.seed(27)

lazy_overlapping_sbm <- overlapping_sbm(n = 1000, k = 5, expected_density = 0.01)
lazy_overlapping_sbm

# sometimes you gotta let the world burn and
# sample a wildly dense graph

dense_lazy_overlapping_sbm <- overlapping_sbm(n = 500, k = 3, expected_density = 0.8)
dense_lazy_overlapping_sbm

k <- 5
n <- 1000
B <- matrix(stats::runif(k * k), nrow = k, ncol = k)

pi <- c(1, 2, 4, 1, 1) / 5

custom_overlapping_sbm <- overlapping_sbm(
  n = 200,
  B = B,
  pi = pi,
  expected_degree = 5
)

custom_overlapping_sbm

edgelist <- sample_edgelist(custom_overlapping_sbm)
edgelist

# efficient eigendecompostion that leverages low-rank structure in
# E(A) so that you don't have to form E(A) to find eigenvectors,
# as E(A) is typically dense. computation is
# handled via RSpectra

population_eigs <- eigs_sym(custom_overlapping_sbm)

fastRG documentation built on Aug. 22, 2023, 1:08 a.m.