In alexpghayes/fastRG: Sample Generalized Random Dot Product Graphs in Linear Time

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

fastRG

fastRG quickly samples a broad class of network models known as generalized random dot product graphs (GRDPGs). In particular, for matrices $X$, $S$ and $Y$, fastRG samples a matrix $A$ with expectation $X S Y^T$ where the entries are independently Poisson distributed conditional on $X$ and $Y$. This is primarily useful when $A$ is the adjacency matrix of a graph. Crucially, the sampling is $\mathcal O(m)$, where $m$ is the number of the edges in graph, as opposed to the naive sampling approach, which is $\mathcal O(n^2)$, where $n$ is the number of nodes in the network. For additional details, see the paper [1].

fastRG has two primary use cases:

Sampling enormous sparse graphs that cannot feasibly be sampled with existing samplers, and
validating new methods for random dot product graphs (and variants).

fastRG makes the latent parameters of random dot product graphs readily available to users, such that simulation studies for community detection, subspace recovery, etc, are straightforward.

Installation

You can install the released version of fastRG from CRAN with:

install.packages("fastRG")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("RoheLab/fastRG")

Usage

There are two stages to sampling from generalized random dot product graphs. First, we sample the latent factors $X$ and $Y$. Then we sample $A$ conditional on those latent factors. fastRG mimics this two-stage sample structure. For example, to sample from a stochastic blockmodel, we first create the latent factors.

library(fastRG)

set.seed(27)

sbm <- sbm(n = 1000, k = 5, expected_density = 0.01)

You can specify the latent factors and the mixing matrix $B$ yourself, but there are also defaults to enable fast prototyping. Here $B$ was randomly generated with Uniform[0, 1] entries and nodes were assigned randomly to communities with equal probability of falling in all communities. Printing the result object gives us some additional information:

sbm

Now, conditional on this latent representation, we can sample graphs. fastRG supports several different output types, each of which is specified by the suffix to sample_*() functions. For example, we can obtain an edgelist in a tibble with:

sample_edgelist(sbm)

but we can just as easily obtain the graph as a sparse matrix

A <- sample_sparse(sbm)
A[1:10, 1:10]

or an igraph object

sample_igraph(sbm)

Note that every time we call sample_*() we draw a new sample.

A <- sample_sparse(sbm)
B <- sample_sparse(sbm)

all(A == B) # random realizations from the SBM don't match!

Efficient spectral decompositions

If you would like to obtain the singular value decomposition of the population adjacency matrix conditional on latent factors, that is straightforward:

s <- eigs_sym(sbm)
s$values

Note that eigendecompositions and SVDS (for directed graphs) use RSpectra and do not require explicitly forming large dense population adjacency matrices; the population decompositions should be efficient in both time and space for even large graphs.

Key sampling options

There are several essential tools to modify graph sampling that you should know about. First there are options that affect the latent factor sampling:

expected_degree: Set the expected average degree of the graph by scaling sampling probabilities. We strongly, strongly recommend that you always set this option. If you do not, it is easy accidentally sample from large and dense graphs.
expected_density: Set the expected density of the graph by scaling sampling probabilities. You cannot specify both expected_degree and expected_density at the same time.

In the second stage of graph sampling, the options are:

poisson_edges: Either TRUE or FALSE depending on whether you would like a Bernoulli graph or a Poisson multi-graph. Scaling via expected_degree assumes a Poisson multi-graph, with some limited exceptions.
allow_self_edges: Whether nodes should be allowed to connect to themselves. Either TRUE or FALSE.

Related work

igraph allows users to sample SBMs (in $\mathcal O(m + n + k^2)$ time) and random dot product graphs (in $\mathcal O(n^2 k)$ time).

You can find the original research code associated with fastRG here. There is also a Python translation of the original code in Python here. Both of these implementations are bare bones.

References

[1] Rohe, Karl, Jun Tao, Xintian Han, and Norbert Binkiewicz. 2017. "A Note on Quickly Sampling a Sparse Matrix with Low Rank Expectation." Journal of Machine Learning Research; 19(77):1-13, 2018. https://www.jmlr.org/papers/v19/17-128.html

alexpghayes/fastRG documentation built on Aug. 31, 2024, 7:43 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com