knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(ggplot2) library(igraph) library(patchwork) library(RWRtoolkit) library(RColorBrewer)
With NetStats, you can generate statistics on your networks with a number of different functions, from generating basic network statistics on a paritcular network layer or multiplex to comparing individual networks against other networks or multiplexes. For example, if you have network layer acting as a gold set standard, you can use that network to test against another non-gold set network or multiplex to determine how closely related they are.
This document descibes the usage of RWR_netstats. This tool performs a suite of statistical tests on supplied networks.
Unlike other functions in the RWRtoolkit, the RWR_netscore
function does not necessarily requre an mpo
R object for all of its functionality (however, many of its functions do require an mpo object).
The current setup of RWR netstats takes a 4 posible input values to generate statistics. - A path to an mpo rdata object - A path to an flist - A path to an edgelist
Edge lists are evaluated as: <Source> <Target> <Weight>
. It should be noted that for the sake of computational speed, the flist
method does not create an actual multiplex object, but only mimics the multiplex itself (i.e. it does not create the supra-adjacency matrix, nor its normalized counterpart).
In order to demonstrate RWR_netstats
, we will need to generate some networks. Let's generate 5 networks:
# a quick function for generating erdos.renyi layers create_layer <- function(n_vertices, probaility, weights){ # overwrite letters to contain letter combinations for more options EXTRALETTERS <- c(LETTERS, sapply(LETTERS, function(x) paste0(x, LETTERS))) layer <- igraph::erdos.renyi.game(n_vertices, probaility) layer <- set_vertex_attr( layer, "name", 1:n_vertices, EXTRALETTERS[1:n_vertices] ) layer } set.seed(42) layer1 <- create_layer(30, 1/5) # will have vertices A - AD layer2 <- create_layer(15, 1/4) # will have vertices A - O layer3 <- create_layer(10, 1/3) # will have vertices A - J ref_layer <- create_layer(10, 1/3) # will have vertices A - J interest_layer <- create_layer(20, 1/4) # will have vertices A - T
We now have 5 sample networks all with similar vertices, but randomly created edges. We can create a multiplex out of 3 of them, and use the fourth to act as a reference network, and the 5th to act as a standalone network of interest. Additionally, we want our edges to have weights, so we will add an additional vector to their edgelists:
layer1_edgelist <- igraph::as_edgelist(layer1) layer2_edgelist <- igraph::as_edgelist(layer2) layer3_edgelist <- igraph::as_edgelist(layer3) ref_layer_edgelist <- igraph::as_edgelist(ref_layer) interest_layer_edgelist <- igraph::as_edgelist(interest_layer) # We need our edges to have weights: layer1_edgelist <- cbind(layer1_edgelist, rep(1, igraph::ecount(layer1))) layer2_edgelist <- cbind(layer2_edgelist, rep(1, igraph::ecount(layer2))) layer3_edgelist <- cbind(layer3_edgelist, rep(1, igraph::ecount(layer3))) ref_layer_edgelist <- cbind(ref_layer_edgelist, rep(1, igraph::ecount(ref_layer))) interest_layer_edgelist <- cbind(interest_layer_edgelist, rep(1, igraph::ecount(interest_layer)) ) # write to file: write.table(layer1_edgelist, file="layer1.tsv", quote=F, sep="\t", row.names = F, col.names = F) write.table(layer2_edgelist, file="layer2.tsv", quote=F, sep="\t", row.names = F, col.names = F) write.table(layer3_edgelist, file="layer3.tsv", quote=F, sep="\t", row.names = F, col.names = F) write.table(ref_layer_edgelist, file="ref_layer.tsv", quote=F, sep="\t", row.names = F, col.names = F) write.table(interest_layer_edgelist, file="interest_layer.tsv", quote=F, sep="\t", row.names = F, col.names = F)
With our files written to a edge lists we can now create an flist file to point to the first three:
file_paths <- c("layer1.tsv", "layer2.tsv", "layer3.tsv") layer_names <- c("layer1", "layer2", "layer3") groups <- c(1, 1, 1) flist_df <- data.frame(list( filepaths = file_paths, layernames = layer_names, groups = groups )) flist_filename <- "./multiplex_flist.tsv" write.table(flist_df, file = flist_filename, quote = F, sep = "\t", row.names = F, col.names = F)
Finally, we need to create our multiplex to use many of the functions offered within RWR_netstats:
multiplex_filepath <- "./mutliplex.Rdata" RWRtoolkit::RWR_make_multiplex( flist_filename, output = multiplex_filepath)
Note: It's often the case the the creation of the multiplex values takes quite a long while. In this case, it's often best to use the flist
parameter when calculating the statistics, as the all of the statistics calculated within the RWR_netstats
function are calculated without making use of the supra-adjacency matrices.
When running RWR_netstats
, there are many options we can use. All options (beyond the networks themselves) require only a boolean flag. First, to obtain base statistics for a given network layer or multiplex we simply use the base_statistics
flag. This returns a dataframe for network containing the number of nodes in the network, the edge count and the network's diameter:
reflayer_filename <- "./ref_layer.tsv" netstats <- RWRtoolkit::RWR_netstats( data = multiplex_filepath, network_1 = reflayer_filename, basic_statistics = TRUE, ) print("Base Statistics Net 1:") print(netstats$base_stats_net1) print("Multiplex Base Statistics:") print(netstats$base_stats_mpo)
With RWR_netstats
we can compare networks by calculating Jaccard similarity coefficient or overlap scores with respect to the shared edges between the networks of interest.
When calculating the Jaccard similarity coefficient between two layers (or two networks) A and B, we create sets of their edges:
$$ jaccard(A,B) = \frac{E(A) \cap E(B)}{ E(A) \cup E(B) } $$
The overlap scoring considers the weights of the network edges that exist in both the reference network (A) and the network of interest (B), divided by the total number of edges within the network of interest:
$$ \hspace{2ex} overlap(A,B) = \frac{\sum_{e \in E(C)} W_{e} }{ | E(B) | } \begin{cases} C = A \cap B \end{cases} $$
The scoring_metric
parameter defines whether jaccard
or overlap
are used for scoring, however both methods can be employed by supplying both
to the parameter. For the remainder of this vignette, we'll use both
.
To compare two individual networks, we'll use our ref_layer
and our interest_layer_edgelist
.
refnet_path <- "./ref_layer.tsv" interest_net_path <- "./interest_layer.tsv" netstats <- RWRtoolkit::RWR_netstats( network_1 = refnet_path, network_2 = interest_net_path, net_to_net_similarity = T, scoring_metric = "both" ) netstats
You'll likely wish to compare each of the layers within a multiplex network as well, this can be done using pairwise_between_mpo_layer
flag to obtain the pairwise overlap scores. :
netstats <- RWRtoolkit::RWR_netstats( data = multiplex_filepath, pairwise_between_mpo_layer = T, scoring_metric = "both" ) print("The pairwise Jaccard Similarity Coefficients:") netstats$pairwise_between_mpo_layer_jaccard print("The pairwise overlap scores are:") netstats$pairwise_between_mpo_layer_overlap
Just as you can compare a single layer to a reference network, it's also possible to compare all layers against a reference network by supplying the reference network as network_1
:
netstats <- RWRtoolkit::RWR_netstats( data = multiplex_filepath, network_1 = refnet_path, multiplex_layers_to_refnet = T, scoring_metric = "both" ) print("The Jaccard Similarity Coefficients between the reference network and the multiplex are:") netstats$multiplex_layers_to_refnet_jaccard
print("The pairwise overlap scores between the reference network and the multiplex are:") netstats$multiplex_layers_to_refnet_overlap
With RWR_netstats
it is also possible to calculate a tau
vector for a particular multiplex given a reference network. This method takes the overlap score of each layer with respect to its reference layer, normalizes those scores (such that all scores sum to one), and then multiples said scores by the total number of layers.
netstats <- RWRtoolkit::RWR_netstats( data = multiplex_filepath, network_1 = refnet_path, calculate_tau_for_mpo = T ) print("The calculated tau vector for our multiplex with respect to the reference network is: ") print(netstats$calculated_tau)
Not all networks will have similar edges. By calculating the exclusivity of a multiplex's network layers, you can calculate the proportion of all edges in the multiplex that are found in each layer:
netstats <- RWRtoolkit::RWR_netstats( data = multiplex_filepath, calculate_exclusivity_for_mpo = T ) netstats$exclusivity
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.