diffusr
implements algorithms for network diffusion such as Markov random walks with restarts and weighted neighbor classification. Network diffusion has been studied extensively in bioinformatics, e.g. in the field of cancer gene prioritization. Network diffusion algorithms generally spread information in the form of node weights along the edges of a graph to other nodes. These weights can for example be interpreted as temperature, an initial amount of water, the activation of neurons in the brain, or the location of a random surfer in the internet. The information (node weights) is iteratively propagated to other nodes until a equilibrium state or stop criterion occurs.
First load the package:
library(diffusr)
A Markov random walk with restarts calculates the stationary distribution: \begin{equation} \mathbf{p}_{t+1} = (1 - r) \mathbf{W} \mathbf{p}_t + r \mathbf{p}_0, \end{equation} where $r \in (0,1)$ is a restart probability of the Markov chain, $\mathbf{W}$ is a column-normalized stochastic matrix (we do the normalization for you) and $\mathbf{p}_0$ is the starting distribution of the Markov chain. We calculate the iterative updates, it is also possible to do the math using the nullspace of the matrix (comes later).
If you want to use Markov random walks just try something like this:
# count of nodes n <- 5 # starting distribution (has to sum to one) p0 <- as.vector(rmultinom(1, 1, prob=rep(.2, n))) # adjacency matrix (either normalized or not) graph <- matrix(abs(rnorm(n*n)), n, n) # computation of stationary distribution pt <- random.walk(p0, graph)
The stationary distribution should have changed quite a bit from the starting distribution:
print(t(p0)) print(t(pt))
You can also use a matrix p0
:
p0 <- matrix(c(p0, runif(20)), nrow=n) pt <- random.walk(p0, graph) pt
In the later case, a random walk is done over all columns of p0
separately.
In the last section we computed the iterative solution of the stationary distribution. You can also choose to do this analytically. In that case we need to take the inverse of the transition matrix which might lead to numerical instability, though. However, it usually runs faster than the iterative version.
pt <- random.walk(p0, graph, do.analytical=TRUE) pt
Diffusion using nearest neighbors is done by traversing through a (weighted) graph and take all the neighbors of a node until a certain depths in the graph is reached. We find shortest paths using priority queues:
# count of nodes n <- 10 # indexes(integer) of nodes for which neighbors should be searched node.idxs <- c(1L, 5L) # the adjaceny matrix (does not need to be symmetric) graph <- rbind(cbind(0, diag(n-1)), 0) # compute the neighbors until depth 3 neighs <- nearest.neighbors(node.idxs, graph, 3)
Let's see what which nodes we got:
print(neighs)
A Laplacian heat diffusion process calculates the heat distribution over a graph after at a specific time $t$: \begin{equation} h_i(t) = h_i(0)\exp(-\lambda_i t) \end{equation} where $\mathbf{h}_0$ is the initial heat distribution, $\mathbf{h}_t$ is the heat distribution at time $t$ and $\boldsymbol \lambda$ are the eigenvalues of the \textit{Laplacian} of your graph.
You can use the Laplacian heat diffusion process like this:
# count of nodes n <- 5 # starting distribution (has to sum to one) h0 <- as.vector(rmultinom(1, 1, prob=rep(.2, n))) # adjacency matrix (either normalized or not) graph <- matrix(abs(rnorm(n*n)), n, n) # computation of stationary distribution ht <- heat.diffusion(h0, graph)
Here are the results:
print(t(h0)) print(t(ht))
As before, p0
can also be a matrix:
h0 <- matrix(c(h0, runif(20)), nrow=n) ht <- heat.diffusion(h0, graph) ht
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.