In dougmet/unsumnet: Solve Row and Column Sum Network Constraint Problems

There are many networks that we are interested in for which we only have aggregated data instead of the full structure. For example, in financial networks we often know the balance sheets (total assets and liabilities) of the individual institutions without knowing detailed exposure.

In matrix notation, the information we have is the row and column sums of the weighted adjacency network, $AW_{ij}$.

$$ R_i = \sum_j AW_{ij} $$ $$ C_i = \sum_j AW_{ji} $$

In general there are many possible networks that could satisfy these constraints. The aim of the unsumnet package is to use methods from statistical mechanics to evenly sample from these networks while allowing the user to choose the sparsity (number of edges).

Aggregated Data

Consider the following fictional banking system. It consists of six banks and they form a closed network of loans, which is represented by the weighted adjacency matrix, neastTrue,

options(digits = 2)

library(unsumnet)
neastTrue

In reality this network is rarely known. It is much more common to have the aggregated data set, where the aggregation is the rowSums, outSum=rowSums(neastTrue) for outgoing loans and colSums inSum=colSums(neastTrue) for incoming loans.

neast

The primary function of this package is to find networks that fit these summed data. We wish to "unsum" the network.

Creating a network

The unsum function allows us to explore networks with the number of edges, nEdges as a control parameter. To make one network with 12 edges we simply call

set.seed(11)
fit <- unsum(neast, 12)

This returns an unsumnet object. The standard adjacency matrix is held in fit$A and the weighted adjacency matrix is fit$AW. The returned matrix will not be the same as the true network, but given the information we have it is equally likely because it satifies the constraints.

fit$AW

The plot_unsum method will plot a matrix and the S3 plot method will do the same for an unsumnet object.

plot_unsum(neastTrue, main="True network")
plot(fit, main="Reconstructed network")

Maximum Entropy Solution

A commonly used solution for such aggregated data is known as the "maximum entropy" solution. The term is a little confusing from a statistical mechanics perspective but in this case it uses every possible edge and spreads out the weights as evenly as possible. The maximum entropy solution is provided by the max_entropy function. It takes the same inputs as unsum. In our example this looks quite different to the true network.

# Get the maximum entropy network
neastME <- max_entropy(neast)
# Compare to the true network
plot_unsum(neastTrue, main="True network")
plot_unsum(neastME, main="Max Entropy")

If no solution is possible then this function will return FALSE. This can be useful for testing impossible constraints.

Netted matrix

In stability terms it is often the netted positions that are more relevant. This is provided by the netted_matrix function. It keeps the positive elements $(AW-AW')_+$.

plot_unsum(netted_matrix(neastTrue))

Extra notes on `unsum`

For large networks you might find the algorithm can take a very long time. The runtime is proportional to the number of edges. For very large networks with many edges it may become unusuable depending on the complexity of the constraints.

Another tricky case is when you ask for very few edges. This makes it harder to find solutions and the algorithm can become trapped in local minima (see below for how to prevent this). For example calling

unsum(neast, 9, verbose=TRUE) # won't converge

and will run around again and again.

The unsum function has many parameters with default values. These are detailed in the help page ?unsum but the most common ones you might need to vary are below. unsum uses simulated annealing and this can take delicate tweaking of parameters if solutions are difficult to get.

verbose: Set this to TRUE if you want the algorithm to print to screen how its doing. Useful for big networks where it can take a long time to run.
coolingRate: This is an important parameter, it sets the rate of cooling. Each mctSchedule time steps the temperature is reduced by this factor. If it is too big then the quench will become trapped in a local minimum (an energy plateau). Too close to 1 and the simulation will take a long time.
minError: The mean squared error that the algorithm must get to before returning. This should be left alone unless you are really struggling to get a solution. If it's too big then the distribution risks becoming skewed.
maxEdges: If you set this to true all edges possible edges are switched on.
noReturn: If this is TRUE then no return edges are allowed. Essentially it's a netted matrix by construction.

Final note

This package is under constant development at its GitHub page. Please report any problems/feature requests on the issue tracker there. Other control parameters, such as controlling for known netted positions will be coming soon.

dougmet/unsumnet documentation built on May 15, 2019, 10:44 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com