This vignette simulates site percolation in our risk sharing network. Site percolation refers to the random removal of nodes on the network. In our network this could be due to migration, death, or agents simply opting out of their obligations (default). We are interested in whether the network is connected (that is, a giant component exists) after a fraction $\phi_c$ of the nodes are randomly removed.
We use the following algorithms to simulate percolation (our graph has 119 vertices):
We implment the algorithm below. First we load the data and create the graph
# Load package library(igraph) library(riskSharing) #Get data data("nyakatoke") # Create the underreporting graph # For the underreporting model, we create a graph from the dataframe # First we keep only the rows where at least one houshold said the other is in # their network. underreporting.df <- nyakatoke[nyakatoke$willingness_link1 == 1 | nyakatoke$willingness_link2 == 1,] # We then convert this into an undirected graph g.underreporting <- igraph::graph_from_data_frame(underreporting.df, directed = FALSE) # We then remove any multiple edges g.underreporting <- igraph::simplify(g.underreporting)
We then implement the algorithm described:
S <- numeric(120) # vector to hold our results sim_size <- 1000 # Number of simulations i <- numeric(sim_size) # Vector to hold cluster sizes for (n in 0:119) { # n is the number of nodes to remove for (m in 1:sim_size) { # m is the mth simulation to_remove <- sample(1:119, n) # randomly pick n of the 119 nodes to remove i[m] <- components( delete_vertices(g.underreporting, to_remove))$csize[1]/119 # calculate the size of the giant component and save the result to # the variable i } S[n+1] <- mean(i) # the average cluster size (after removing n nodes) is the average of the # i variable }
Let's put the results in a nice table/graph. We will also overlay theoretical
results for the size of the giant component for the Poisson random graph and the
Power law graph. These theoretical results are obrained by using the generating
function framework described in (Newman Date). The theoretical size of the
giant component is given by $S=\phi[1-g_0(u)]$, where
$g_0(u)=\sum_{k=0}^\infty(p_k z^k)$. We have to calculate the value of $u$,
which is the average probability that a vertex is not connected to the giant
component via a particular neighboring vertex. This is given by
$u = 1 - \phi+\phi g_1(u)$ where $g_1(z) = \sum_{k=0}^\infty(q_k z^k)$,
$q_k = \frac{(k+1) p_{k+1}}{
For the Possion case, the appropriate generating functions are:
$$
g_0(z)=e^{\lambda(z-1)}\
g_1(z)=e^{\lambda(z-1)}
$$
where $\lambda$ is the average degree of the graph (same as $
$$ u = 1 - \phi - \frac{W(\lambda \phi-e^{-\lambda \phi})}{\lambda}\ S = \frac{W(-\lambda \phi e^{-\lambda \phi})}{\lambda} $$
For the power law case, we obtain
$$ g_0 = \frac{Li_\alpha(z)}{\zeta(\alpha)} \ g_1 = \frac{Li_{\alpha - 1}(z)}{z \zeta(\alpha - 1)} $$
my_df <- data.frame(phi = 1 - (0:119)/119, S = S) # rbind(my_df, c(phi = 0, S = 1), c(phi = 1, S = 0)) main = expression(paste("Occupaion Probability, ", phi, ", vs. size of giant cluster ", S)) plot(my_df[1:117,], type = "l", main = main, xlab = expression(phi)) # We will also add the theoretical results - first exponential # library(gsl) # For Lambert's W function lambda <- mean(degree(g.underreporting)) plot(function(phi) phi+gsl::lambert_W0(-lambda * phi * exp(-lambda*phi))/lambda, from = 0, to = 1, add = TRUE, col = "red") # Now the power law theoretical results. We first obtain the MLE estimates # alpha <- power.law.fit(degree(g.underreporting))$alpha # k <- degree(g.underreporting) # Ek <- mean(k) # Ek2 <- mean(k^2) # phi_c <- Ek/(Ek2 - Ek) # plot(function(phi) (2*Ek/0.810891)*(phi - phi_c)/phi_c, from = 0, to = 1, add = TRUE, col = "blue") legend("topleft", legend = c("Simulation", "Poisson"), col = c("black", "red"), lty = c(1, 1))
The graph displays the occupation probability $\phi$ (fraction of vertices remaining in the graph) vs. the percolation strength $S$ (the average size of the giant component). If we start off at the right hand of the graph, we get the original (connected) graph. As we move towards the left, more and more vertices are removed. We see that by the time we have removed about half the nodes, the percolation probability is around 48%, and after removing about 2/3 of the nodes, the percolation probability is 29%.
Note that for Poisson random graphs, we would expect to see a critical $\phi_c$ above which we get percolation almost surely. In contrast, for a purely power law distribution, removing many vertices should still keep the network intact. In contrast, our graph has a smooth relationship between average cluster size and occupation probability (with average cluster size declining slightly faster than the occupation probability).
The above simulation looked at the underreporting model (i.e. we assumed that if any household reported a link with another, that such a link exists). We now examine percolation in the overreporting model (we only say a link exists when both households report the link). Note that in the overreporting model, the largest component does not extend over the whole network.
We first (re)create the graph:
# create the graph (see the basic statistics section for documentation of the # procedure below) overreporting.df <- nyakatoke[nyakatoke$willingness_link1 == 1 & nyakatoke$willingness_link2 == 1,] g.overreporting <- igraph::graph_from_data_frame(overreporting.df, directed = FALSE) missing.vertices <- c(7, 30, 32, 46, 44, 65, 84, 88, 91, 96, 107, 110, 116, 117, 118, 119) g.overreporting <- (Reduce(f = function(x, y) {y + igraph::vertex(x)}, x = missing.vertices, init = g.overreporting, right = TRUE)) g.overreporting <- igraph::simplify(g.overreporting)
We now perform the same simulation as above, but for the overreporting network:
S <- numeric(120) # vector to hold our results sim_size <- 1000 # Number of simulations i <- numeric(sim_size) # Vector to hold cluster sizes for (n in 0:119) { # n is the number of nodes to remove for (m in 1:sim_size) { # m is the mth simulation to_remove <- sample(1:119, n) # randomly pick n of the 119 nodes to remove i[m] <- components( delete_vertices(g.overreporting, to_remove))$csize[1]/119 # calculate the size of the giant component and save the result to # the variable i } S[n + 1] <- mean(i) # the average cluster size (after removing n nodes) is the average of the # i variable } my_df <- data.frame(phi = 1 - (0:119)/119, S = S) # rbind(my_df, c(phi = 0, S = 1), c(phi = 1, S = 0)) main = expression(paste("Occupaion Probability, ", phi, ", vs. size of giant cluster ", S)) plot(my_df[1:117,], type = "l", main = main, sub = "Overreporting Model", xlab = expression(phi)) lambda <- mean(degree(g.overreporting)) plot(function(phi) phi + gsl::lambert_W0(-lambda * phi * exp(-lambda*phi))/lambda, from = 0, to = 1, add = TRUE, col = "red") legend("topleft", legend = c("Simulation", "Poisson"), col = c("black", "red"), lty = c(1, 1))
As stated earlier, the largest component in the overreporting model does not cover all the nodes in the graph. The size of the giant component also appears to breakdown much faster than in the underreporting model. When about a quarter of the nodes are removed from the network, the largest component only connects half of the households.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.