This vigentte shows some basic descriptive statstics about the networks fomred by the Nyakatoke Village survey.
The survey data comes from (Fafchamps and Comola, 2013) [1]. They provide the following description of the data
The Tanzanian data set comes from a village community named Nyakatoke in the Buboka Rural District of Tanzania, at the west of Lake Victoria. The data set is a census covering all 119 households in the village and includes information on households' demographics, wealth and assets, income sources and income shocks, transfers and interpersonal relations...
During the first survey round, each adult in the village was asked: ‘Can you give a list of people from inside or outside of Nyakatoke, who you can personally rely on for help and/or that can rely on you for help in cash, kind or labour?’Aggregated at the level of each household, the responses to this question constitute variables $g_{ij}^{i}$ and $g_{ji}^j$. In other words, $g_{ij}^{i} = 1$ if an adult member of household $i$ mentions an adult member of household $j$ in their response to the above question.
In the data, the households are given a number from 1 through 119. The columns
hh1
and hh2
are the IDs of two households. willingness_link1
takes on a
value of $1$ if at least one adult member from the household identified by hh1
mentioned someone from the household identified by hh2
in their answer to the
question "Can you give a list of people from inside or outside of Nyakatoke,
who you can personally rely on for help and/or that can rely on you for help
in cash, kind or labour"? Similarly, willingness_link2
takes on a value of 1
if a member of hh2
listed someone from hh1
to the same question.
library(riskSharing) data("nyakatoke") knitr::kable(head(nyakatoke[,1:4]))
Table: The first few rows and four columns of data from the Nyakatoke village survey
Here we see (from row 1) that household 113 and household 1 have no link betwwen
them. From row 2, we see Household 76 and Household 1 might have a link because
someone from household 1 mentioned household 76 in the answer to the survey.
However, no one from household 76 mentioned anyone from household 1 in their
answer. Finally, from row 3 we see there is a link between household 17 and
household 1 (both willingness_link1
and willingness_link2
are 1).
The definitions of these columns will be expanded upon when we use them in later analysis.
Note that in our dataset, each pair of household shows up twice, once in the hh1
column and again in the hh2
column. For example, household 1 and household 76
show up as:
knitr::kable(subset(nyakatoke[,1:4], (hh1 == 1 & hh2 == 76) | (hh1 == 76 & hh2 == 1)))
Here we see the combination (household 76, household 1) show up in row 2 of the data
and the combination (household 76, household 1) show up in row 8937 of the data.
Looking at the willingness_link1
and willingness_link2
columns, we see that the
data in these rows are just flipped versions of each other.
The data also lists other characteristics of the housholds. Here we show a few rows and all columns of the dataset.
knitr::kable(head(nyakatoke))
Table: All columns from in the data.
Using the data from the survey, we can create several networks. We create the following three networks (more details about what each network represents is provided in our paper).
A directed network. If (an adult member of) household 1 lists (somenoe from) household 2 as an answer to the relevant survey question, we say there is a directed link FROM household 1 TO household 2. This corrsponds to the desire to link model in our paper.
An undirected network assuming links are underreported. In this case, if an (adult member) from household 1 mentions someone from household 2 as an answer to the relevant survey question, we say there is an undirected link between the two households. This corresponds to the underreporting model in our paper.
An undirected network assuming links are overreported. In this case, if members from both households list someone from the other household as an answer to the relevant survey question, we say there is an undirected link between the two households. This corresponds to the overreporting model in our paper.
# For the desire to link data, we create an "edgelist" that igraph can read. # This is a list of households where the first household in the pair mentioned # someone from household 2 in their survey response. edgelist <- as.matrix(nyakatoke[nyakatoke$willingness_link1 == 1, 1:2]) g.directed <- igraph::graph_from_data_frame(edgelist) # For the underreporting model, we create a graph from the dataframe # First we keep only the rows where at least one houshold said the other is in # their network. underreporting.df <- nyakatoke[nyakatoke$willingness_link1 == 1 | nyakatoke$willingness_link2 == 1,] # We then convert this into an undirected graph g.underreporting <- igraph::graph_from_data_frame(underreporting.df, directed = FALSE) # We then remove any multiple edges g.underreporting <- igraph::simplify(g.underreporting) # For the overreporting model we only keep those rows where both households # listed the other in their network overreporting.df <- nyakatoke[nyakatoke$willingness_link1 == 1 & nyakatoke$willingness_link2 == 1,] # We then create an graph g.overreporting <- igraph::graph_from_data_frame(overreporting.df, directed = FALSE) # From examining the list of vertices, we note that households 7, 30 32, 36, 44, # 65, 84, 88, 91, 96, 107, 110, 116, 117, 118 and 119 don't show up in the # graph. This is because there is no case where these households list another # household in their support network and they do the same. We dd back these # vertices as isolated vertices missing.vertices <- c(7, 30, 32, 46, 44, 65, 84, 88, 91, 96, 107, 110, 116, 117, 118, 119) g.overreporting <- (Reduce(f = function(x, y) {y + igraph::vertex(x)}, x = missing.vertices, init = g.overreporting, right = TRUE)) # We remove multiple edges g.overreporting <- igraph::simplify(g.overreporting)
Now that we have created the graphs, we report some basic descriptive statistics for each graph:
descriptive_stats(g.directed)
descriptive_stats(g.underreporting)
descriptive_stats(g.overreporting)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.