Purpose

This vigentte shows some basic descriptive statstics about the networks fomred by the Nyakatoke Village survey.

Survey and data organization

The survey data comes from (Fafchamps and Comola, 2013) [1]. They provide the following description of the data

The Tanzanian data set comes from a village community named Nyakatoke in the Buboka Rural District of Tanzania, at the west of Lake Victoria. The data set is a census covering all 119 households in the village and includes information on households' demographics, wealth and assets, income sources and income shocks, transfers and interpersonal relations...

During the first survey round, each adult in the village was asked: ‘Can you give a list of people from inside or outside of Nyakatoke, who you can personally rely on for help and/or that can rely on you for help in cash, kind or labour?’Aggregated at the level of each household, the responses to this question constitute variables $g_{ij}^{i}$ and $g_{ji}^j$. In other words, $g_{ij}^{i} = 1$ if an adult member of household $i$ mentions an adult member of household $j$ in their response to the above question.

In the data, the households are given a number from 1 through 119. The columns hh1 and hh2 are the IDs of two households. willingness_link1 takes on a value of $1$ if at least one adult member from the household identified by hh1 mentioned someone from the household identified by hh2 in their answer to the question "Can you give a list of people from inside or outside of Nyakatoke, who you can personally rely on for help and/or that can rely on you for help in cash, kind or labour"? Similarly, willingness_link2 takes on a value of 1 if a member of hh2 listed someone from hh1 to the same question.

library(riskSharing)
data("nyakatoke")
knitr::kable(head(nyakatoke[,1:4]))

Table: The first few rows and four columns of data from the Nyakatoke village survey

Here we see (from row 1) that household 113 and household 1 have no link betwwen them. From row 2, we see Household 76 and Household 1 might have a link because someone from household 1 mentioned household 76 in the answer to the survey. However, no one from household 76 mentioned anyone from household 1 in their answer. Finally, from row 3 we see there is a link between household 17 and household 1 (both willingness_link1 and willingness_link2 are 1).

The definitions of these columns will be expanded upon when we use them in later analysis.

Note that in our dataset, each pair of household shows up twice, once in the hh1 column and again in the hh2 column. For example, household 1 and household 76 show up as:

knitr::kable(subset(nyakatoke[,1:4], (hh1 == 1 & hh2 == 76) | (hh1 == 76 & hh2 == 1)))

Here we see the combination (household 76, household 1) show up in row 2 of the data and the combination (household 76, household 1) show up in row 8937 of the data. Looking at the willingness_link1 and willingness_link2 columns, we see that the data in these rows are just flipped versions of each other.

The data also lists other characteristics of the housholds. Here we show a few rows and all columns of the dataset.

knitr::kable(head(nyakatoke))

Table: All columns from in the data.

Several networks from underlying data

Using the data from the survey, we can create several networks. We create the following three networks (more details about what each network represents is provided in our paper).

  1. A directed network. If (an adult member of) household 1 lists (somenoe from) household 2 as an answer to the relevant survey question, we say there is a directed link FROM household 1 TO household 2. This corrsponds to the desire to link model in our paper.

  2. An undirected network assuming links are underreported. In this case, if an (adult member) from household 1 mentions someone from household 2 as an answer to the relevant survey question, we say there is an undirected link between the two households. This corresponds to the underreporting model in our paper.

  3. An undirected network assuming links are overreported. In this case, if members from both households list someone from the other household as an answer to the relevant survey question, we say there is an undirected link between the two households. This corresponds to the overreporting model in our paper.

# For the desire to link data, we create an "edgelist" that igraph can read.
# This is a list of households where the first household in the pair mentioned
# someone from household 2 in their survey response.
edgelist <- as.matrix(nyakatoke[nyakatoke$willingness_link1 == 1, 1:2])
g.directed <- igraph::graph_from_data_frame(edgelist)

# For the underreporting model, we create a graph from the dataframe
# First we keep only the rows where at least one houshold said the other is in
# their network.
underreporting.df <- 
    nyakatoke[nyakatoke$willingness_link1 == 1 | nyakatoke$willingness_link2 == 1,]
# We then convert this into an undirected graph
g.underreporting <- igraph::graph_from_data_frame(underreporting.df, 
                                                  directed = FALSE)
# We then remove any multiple edges
g.underreporting <- igraph::simplify(g.underreporting)

# For the overreporting model we only keep those rows where both households
# listed the other in their network
overreporting.df <- 
    nyakatoke[nyakatoke$willingness_link1 == 1 & nyakatoke$willingness_link2 == 1,]
# We then create an graph
g.overreporting <- igraph::graph_from_data_frame(overreporting.df, directed = FALSE)
# From examining the list of vertices, we note that households 7, 30 32, 36, 44,
# 65, 84, 88, 91, 96, 107, 110, 116, 117, 118 and 119 don't show up in the 
# graph. This is because there is no case where these households list another 
# household in their support network and they do the same.  We dd back these 
# vertices as isolated vertices
missing.vertices <- c(7, 30, 32, 46, 44, 65, 84, 88, 91, 96, 107, 110, 
                      116, 117, 118, 119)
g.overreporting <- (Reduce(f = function(x, y) {y + igraph::vertex(x)},
       x = missing.vertices,
       init = g.overreporting,
       right = TRUE))
# We remove multiple edges
g.overreporting <- igraph::simplify(g.overreporting)

Now that we have created the graphs, we report some basic descriptive statistics for each graph:

  1. The number of vertices
  2. The number of edges
  3. The average degree (or average in-degree and out-degree for a directed graph)
  4. The average path length

Directed Graph

descriptive_stats(g.directed)

Undirected graph

Underreporting model

descriptive_stats(g.underreporting)

Overreporting Model

descriptive_stats(g.overreporting)


arnoblalam/riskSharing documentation built on Jan. 12, 2021, 5:16 a.m.