In R-KenK/SimuNet: Network Simulation Framework, with a Zest of Empiricism

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Let's first generate a theoretical simulation:

library(SimuNet)
library(magrittr)
set.seed(42)
n <- 5L
samp.effort <- 100L
Adj <- sample(1:samp.effort,n * n) |>
  matrix(nrow = n,dimnames = list(letters[1:n],letters[1:n]))
Adj[lower.tri(Adj,diag = TRUE)] <- 0L
Adj

sL <- simunet(Adj,samp.effort,"upper",10)
sL

Designin an experiment

Experimental designs - expDesign objects in SimuNet - are successions of manipulations that are applied to a scanList, i.e. a 3-dimensional array.

To design an experiment, simply input functions as successive arguments of design_exp():

my_fun1 <- function(scan.list) {
  # do things on scan.list
  scan.list
}

my_fun2 <- function(scan.list) {
  # do other things on scan.list
  scan.list
}

design_exp(my_fun1,remove_mostCentral,my_fun2)

You can store design_exp()'s output and apply it to scanList via perform_exp().

An expDesign can be expanded by inputting it in design_exp() with other functions or expDesign.

eD1 <- design_exp(my_fun1,remove_mostCentral)

design_exp(eD1,my_fun2)    # this is similar to...

# ... this:
design_exp(my_fun1,remove_mostCentral,my_fun2)

Since design_sampling() also returns expDesign objects, this is how one can include a sampling method within an experimental design:

design_exp(my_fun1,design_sampling("group",0.9))

foc.even <- design_sampling("focal","even")
foc.even
design_exp(my_fun2,foc.even,my_fun1)

design_exp() uses purrr's purrr::compose() to encapsulate the sequence of manipulations into a single function stored as FUN.seq in the expDesign object, to be applied to the scanList.

Manipulation "building blocks" are included in SimuNet to be used in experimental designs, but users should feel free to create their own functions to use in a sequence.

WIP: SimuNet is planned to later include a feature to automatically carry the scanList's attribute along the sequence manipulations, whether they include user-defined functions or not. In the meantime, copy_attrs_to(from,to) allows for instance to copy attrs from an original scanList to its modified version.

An example of experimental design

Let's try out a concrete example: say you want to assess how would the adjacency matrix be affected if the overall most central node had not been there (but nothing else would have changed).

remove_mostCentral() does just that, and can be included alone or in combination in an expDesign. Let's look at its code:

getS3method("remove_mostCentral","scanList") # for convenience reasons, SimuNet's building blocks
                                                # are written as S3 methods, but this isn't required

We can see that it:

sums the scanList into a weighted adjacency matrix
calculate node eigen-vector centrality
determine the maximum
remove it from all scans
returns the resulting modified scanList, with the copied attrs

Imagine what manipulation you too can design and use in SimuNet!

Let's include remove_mostCentral() in its own experimental design, and apply it to sL

removeC <- design_exp(remove_mostCentral)
removeC

Cremoved <- sL |> perform_exp(removeC)
Cremoved
Cremoved |> sum_scans()

We can see that node e was removed from all scans.

Let's have a look at the original nodes' centrality:

sL |> 
  sum_scans() |>
  igraph::graph.adjacency(weighted = TRUE) |>
  igraph::eigen_centrality(directed = FALSE) %>%
  .$vector

as expected, e had the highest eigen-vector centrality value.

Let see for instance how this impacted the network's clustering coefficient:

sL |> 
  sum_scans() %>%
  {class(.) <- NULL;.} %>%
  {. + t(.)} |>           # DirectedClustering requires symmetrical matrices for undirected networks
  DirectedClustering::ClustBCG(type = "undirected")

Cremoved |> 
  sum_scans() %>% 
  {class(.) <- NULL;.} %>%
  {. + t(.)} |> 
  DirectedClustering::ClustBCG(type = "undirected")

An more complexe example

What would have happened if, in addition to having the most central individual removed, we also missed 20% of the edges' value when doing group-scans?

Let's design this second experiment:

removeC2 <- design_exp(remove_mostCentral,design_sampling("group",0.8))
removeC2

Cremoved2 <- sL |> perform_exp(removeC2)
Cremoved2
Cremoved2 |> sum_scans()

Generate data from simulations and experiments

Let's push further the previous example: we will replicate the design on a larger network containing more nodes:

set.seed(42)
n <- 10L
samp.effort <- 300L
Adj <- sample(1:samp.effort,n * n) |>
  matrix(nrow = n,dimnames = list(letters[1:n],letters[1:n]))
Adj[lower.tri(Adj,diag = TRUE)] <- 0L
Adj

sL <- simunet(Adj,samp.effort,"upper",100)
sL

Replicating theoretical simulations allows to assess the clustering coefficient distribution:

get_undirectedCC <- function(scanList) {
  CC <- 
    scanList |> 
    sum_scans() %>%
    {class(.) <- NULL;.} %>%  # modify the sumeed scanList into a class-less matrix
    {. + t(.)} |>             # symmetrize the triangular matrix by adding its transposed matrix
    DirectedClustering::ClustBCG(type = "undirected")
  CC$GlobalCC
}

# calculate the clustering coefficient from a single simulation
simunet(Adj,samp.effort,"upper",100) |> get_undirectedCC()

# Replicating the operation
CC.theo <- data.frame(type = "theoretical",
                      CC = replicate(100,
                                     simunet(Adj,samp.effort,"upper",100) |>
                                       get_undirectedCC()
                      )
)
summary(CC.theo)

Let's do the same when the most central node is removed, and 20% of the edges are missed:

# calculate the clustering coefficient from a single simulation
simunet(Adj,samp.effort,"upper",100,removeC2) |> get_undirectedCC()

# Replicating the operation
CC.empi <- data.frame(type = factor(c("theoretical","empirical"),
                                    levels = c("theoretical","empirical")
                      ),
                      rep = rep(1:100,each = 2) |> as.factor(),
                      CC = replicate(100,simplify = FALSE, # TODO: come up with more elegant code
                                     {
                                       sL <- simunet(Adj,samp.effort,"upper",100,removeC2)
                                       CC.theo <- sL$theoretical.scanList |> get_undirectedCC()
                                       CC.empi <- sL |> get_undirectedCC()
                                       c(CC.theo,CC.empi) 
                                     }
                      ) |> do.call(what = c)
)
summary(CC.empi)

library(ggplot2)
CC.empi |>
  ggplot(aes(type,CC,colour = type,fill = type))+
  geom_boxplot(alpha = .5)+
  geom_line(aes(group = rep),alpha = 0.05,colour = "grey50")+
  geom_point(alpha = 0.02)+
  guides(fill = "none",colour = "none")+
  scale_x_discrete(labels = c("Original","Most central node removed\n+\n20% edges missed"))+
  labs(x = "",y = "Mean Clustering Coefficient")+
  theme_minimal(15)+
  theme(plot.background = element_rect(fill = 'white', colour = NA))

As seen above, combining SimuNet's experimental design approach in combination with R's replicate() (or lapply()/sapply() to allow handling of the replication index) allows for varied and customized collection of simulated data!

Helper functions

SimuNet includes some helper functions to convert objects to fit your needs.

These helper functions include:

scanList2matList(): convert a scanList 3D array into an R list of 2D matrices
matList2scanList(): does the opposite as scanList2matList()
In combination with already existing functions - e.g. igraph::graph.adjacency() - you can use flexibly many network manipulations in your experimental design!

Examples

Using matrix lists:

Let's implement manipulations to apply on a list of matrices:

# This function mask a random node from a matrix
mask_randomNode <- function(Adj) {
  to.mask <- sample(1:nrow(Adj),1)
  Adj[c(to.mask),] <- NA
  Adj[,c(to.mask)] <- NA
  Adj
}

# a scanList can be transformed to a list of matrices, and lapply can be used to apply
# mask_randomNode to each. matList2scanList back transforms it:
mask_inEachScan <- function(sL) {
  mL <- scanList2matList(sL)
  mL |> 
    lapply(mask_randomNode) |> 
    copy_attrs_to(from = mL) |>     # keeps track of sL's attributes
    matList2scanList()
}

rand.mask <- design_exp(mask_inEachScan)
rand.mask
sL |> perform_exp(rand.mask)

Using igraph networks

Let's see with another example relying on igraph networks:

# This function mask a random node from a matrix
rewire_each <- function(sL) {
  mode <- sL$mode
  sLapply(   # wrapper for lapply over scanLists' 3rd dimension. See ?sLapply
    sL,
    function(scan) {
      scan |>
        igraph::graph.adjacency(mode = mode,weighted = TRUE) |>     # transform to igraph network
        igraph::rewire(igraph::each_edge(p = .2, loops = FALSE)) |> # rewire networks
        igraph::get.adjacency(type = "upper",sparse = FALSE)        # transform back to matrix
    }
  )
}

rand.rewire <- design_exp(rewire_each)
rand.rewire
rewired <- sL |> perform_exp(rand.rewire)
sL[,,3:4]
rewired[,,3:4]