align: Network Alignment

View source: R/align.R

alignR Documentation

Network Alignment

Description

Network alignment by comparing the entropies of diffusion kernels simulated on two networks. align takes two networks stored as matrices and returns a node-level alignment between them.

Usage

align(
  network_1_input,
  network_2_input,
  base = 2,
  max_duration,
  characterization = "entropy",
  normalization = FALSE,
  unit_test = FALSE
)

Arguments

network_1_input

The first network being aligned, which must be in matrix form. If the two networks are of different sizes, it will be easier to interpret the output if this is the smaller one.

network_2_input

The second network, which also must be a matrix.

base

Defaults to 1. The base in the series of time steps to sample the diffusion kernels at. If base = 1 every time step is sampled. If base = 2, only time steps that are powers of 2 are sampled, etc. Larger values place more emphasis on earlier time steps. This can be helpful if the diffusion kernel quickly converges to an equilibrium, and also runs faster.

max_duration

Defaults to twice the diameter of the larger network. Sets the number of time steps to allow the diffusion kernel to spread for, which is the smallest power of base that is at least as large as max_duration.

characterization

Defaults to "entropy". Determines how the diffusion kernels are characterized. Either "entropy" or "gini". "entropy" is a size-normalized version of Shannon's entropy with base e (Euler's number). This is also known as interaction or species evenness in ecology. "gini" is the Gini coefficient.

normalization

Defaults to FALSE. Determines if self-loops should be augmented such that edge weights are proportional to those in network_1_input and network_2_input. FALSE by default because this is inappropriate for unweighted binary/logical networks where edges indicate only the presence of an interaction.

unit_test

Defaults to FALSE. Saves the following intermediate steps to help with general troubleshooting: post-processing matrix representations of both networks, time steps at which the diffusion kernels were sampled, the diffusion kernels at those time steps, the characterizations of the diffusion kernels at those time steps, and the cost matrix fed into the Hungarian algorithm where the ij element is the difference between the characterization-over-time curves for node i in the first network and node j in the second network.

Details

Network alignment pairs nodes between two networks so as to maximize similarities in their edge structures. This allows information from well-studied systems to be used in poorly studied ones, such as to identify unknown protein functions or ecosystems that will respond similarly to a given disturbance. Most network alignment algorithms focus on measures of topological overlap between edges of the two networks. The method implemented here compares nodes using the predictability of dynamics originating from each node in each network. Consider network alignment as trying to compare two hypothetical cities of houses connected by roads. The approach implemented here is to pairwise compare each house with those in the other city by creating a house-specific signature. This is accomplished by quantifying the predictability of the location of a person at various times after they left their house, assuming they were moving randomly. This predictability across all houses captures much of the way each city is organized and functions. align uses this conceptual rationale to align two networks, with nodes as houses, edges as roads, and random diffusion representing people leaving their houses and walking around the city to other houses. The mechanics of this, which are conceptually akin to flow algorithms and Laplacian dynamics, can be analytically expressed as a Markov chain raised to successive powers which are the durations of diffusion.

Note that the novel part of align lies in creating a matrix where the ij entry is a measure of similarity between node i in the first network and node j in the second. The final alignment is found using solve_LSAP in the package clue, which uses the Hungarian algorithm to solve the assignment problem optimally.

Value

score

Mean of all alignment scores between nodes in both original networks network_1_input and network_2_input.

alignment

Data frame of the nodes in both networks, sorted numerically by the first network (why it helps to make the smaller network the first one), and the corresponding alignment score.

score_with_padding

Same as score but includes the padding nodes in the smaller network, which can be thought of as a size gap penalty for aligning differently sized networks. Only included if the input networks are different sizes.

alignment_with_padding

Same as alignment but includes the padding nodes in the smaller network. Only included if the input networks are different sizes.

References

Kuhn, H. W. (1955). The Hungarian method for the assignment problem. Naval Research Logistics (NRL), 2(1-2), 83-97.

Langendorf, R. E., & Goldberg, D. S. (2019). Aligning statistical dynamics captures biological network functioning. arXiv preprint arXiv:1912.12551.

C. Papadimitriou and K. Steiglitz (1982), Combinatorial Optimization: Algorithms and Complexity. Englewood Cliffs: Prentice Hall.

Examples

# The two networks to be aligned
net_one <- matrix(stats::runif(25,0,1), nrow=5, ncol=5)
net_two <- matrix(stats::runif(25,0,1), nrow=5, ncol=5)

align(net_one, net_two)
align(net_one, net_two, base = 1, characterization = "gini", normalization = TRUE)


langendorfr/netcom documentation built on July 23, 2022, 5:19 p.m.