Problem set: The effect of the TseTse fly on African Development

user.name = '' # set to your user name

library(RTutor)
check.problem.set('TseTseAfrica', ps.dir, ps.file, user.name=user.name, reset=FALSE)

# Run the Addin 'Check Problemset' to save and check your solution

Author: Vanessa Schoeller
Date: 18.05.2017

Exercise Overview

Introduction

Welcome to this problem set! It is the main part of my Bachelor thesis at the University Ulm.

"It has long been an axiom of mine that the little things are infinitely the most important."

(Arthur Conan Doyle 1892)

This quotation is from the British physician and writer Arthur Conan Doyle best known for creating the stories about Sherlock Holmes. It describes well the question we investigate in this problem set. What effect does the TseTse fly has on the African development? In the past researches mainly investigated how communicable diseases harmful for humans affected the economic output. We will adapt a different approach and focus on how animal trypanosomiasis the veterinary disease transmitted by TseTse acted on the development. Is it possible that a fly no bigger than 1.5 cm affects the development of multiple African countries lying in the tropics? This leading research question can be assigned to the field of comparative development economics. We will mainly compare the historical evolution of economic organizations in Africa.

To approach this question, we adapt the structure and content from the paper "The effect of the TseTse fly on African development" by Marcella Alsan (2013). You find the paper and the corresponding data here. This problem set replicates the author's work with the help of the statistic program R. You can just click yourself through it, have fun and incidentally learn more about statistical programming with R, how to work with econometric data, and last but not least the effect of the fly.

The structure of the problem set:

  1  Loading and analyzing the data  

  2  Introduction of the TseTse suitability index: laboratory experiments and empirical framework   

  3  Visual comparison of the suitability for TseTse with the suitability for rainfed agriculture in Africa

  4  Regression: Correlation between subsistence strategies and the TSI  

      4.1 Linear and Multiple Regression  

      4.2 Clustered robust standard errors  

  5  Regression: Correlation between development variables and the TSI  

  6  Placebo test: Correlation between TSI and development in the tropics outside Africa  

  7  Simulation of Africa without the TseTse and archeological evidence illustrated by the example of Great Zimbabwe

  8  Impact of the TseTse on modern African development   

  9  Robustness tests   

  10 Conclusion and Outlook   

  11 References   

Notes on how to work with the elements of the problem set

The problem set consists of normal text, code blocks, info blocks, and quizzes.

You can solve the exercises in any possible order. Nevertheless, I recommend completing the exercise sheets sequentially because there are concepts introduced in early exercises which are assumed to be understood in later ones. The tasks inside a tab must be solved in the given order. That means you cannot solve 1 b) if you have not solved 1 a) beforehand. The problem set contains several info-blocks which give additional information about R packages, background information or comments. You can read them by just clicking on the headline. Also, there are several small quizzes included which test if you did understand the coherences. The quizzes are optional and you can continue the problem set without having solved them. So much for the basics. Everything else you need to know will be explained when we work with it.

Let us start our economic journey!

Exercise 1 -- Loading and analyzing the data

General information about Tsetse

Maybe one of your first questions will be: "What is the TseTse fly?"

Figure 1: Image of the Tsetse,
Source: International Atomic Energy Agency, https://commons.wikimedia.org/w/index.php?curid=42087829

The TseTse fly is endemic to Africa and found in most tropical African countries. Female and male TseTse flies feed on human and animal blood. While doing this they act as a vector for the parasite Trypanosoma which causes the sickness Trypanosomiasis. If you want to find out more about how the transmission of the parasite works, just open the info block below. One can distinguish between human Trypanosomiasis also known as sleeping sickness and animal Trypanosomiasis also called Nagana or animal sleeping sickness. To make it clearer and easier we will mainly use the term sleeping sickness in this problem set and refer to the form that infects animals.

info("Transmission of Trypanosomiasis") # Run this line (Strg-Enter) to show info

This parasite transmitted by the TseTse fly is harmful for humans and animals and without treatment it mostly ends fatally. Because of this lethality we could assume that a disease which kills the infected animal fast might eradicate itself. But wild game which is immune serves as a reservoir. Also, all huffed animals can be affected which is the special danger of Trypanosomiasis compared to most other veterinary diseases which only infect one species (Brown and Gilfoyle 2010).

In the following exercises, we will have a closer look on how the fly influenced ethnic groups living in precolonial Africa. This helps us to get a better understanding of the differences in economic performances in modern Africa. We will focus on animal trypanosomiasis not the forms that infects humans. Is it possible that a little fly affects the agricultural production, political centralization, and population density of multiple countries?

Loading and analyzing the data

To work with the data, we first have to load it. The data we want to use in our problem set is stored in Stata. Stata is a statistical software and Alsan (the author of the paper that provides the base of this problem set) used it for her analysis. R provides a package called foreign with this we can read in the data from Alsan's paper. For more information about the used package right click here and open a new tab. If you did not hear about R packages before, open the info section and find out more.

info("Packages in R") # Run this line (Strg-Enter) to show info

Below you see the first code chunk. I will give you a short instruction on how the elements of the problem set work even if it is mostly intuitive. At first you click edit. Then you can select between check, hint, run chunk, data, and solution. The normal way is that you insert your answer in the chunk, click check and get a message if your answer is right or wrong. If you are right, you can just continue solving the rest of the problem set. If you typed in a wrong answer, do not worry you can just try an alternative solution as often as you like. Another option is to click hint so you will get a tip. If you got completely stuck, you can always click solution, get the right code, and click check to continue. To have a look at the dataset you can open the Data Explorer by clicking on data.

So here is your first task. Click edit and afterwards check.

# loading the package
library(foreign)
# loading the data
pre = read.dta("precolonial.dta")

The loaded data is one of three cross-sectional datasets we will use in this problem set. You will get information about the other datasets every time before we start working with them.

Some general background information about the loaded precolonial data: The dataset precolonial.dta is an extract of a global database called Ethnographic Atlas containing historical characteristics of more than 500 ethnic groups that are still living or have lived in Africa before the European settlement. The data is collected between 1800 and the beginning of the twentieth century. The observed dataset is cross-sectional. If you wish to find out more, please open the following info sector.

info("Type of data: cross-sectional, time-series and panel-data") # Run this line (Strg-Enter) to show info

Now we want to get an overview of the data. One interesting thing to know about the dataset: How many rows and columns are contained in the dataset? The right command here is dim(). Inside the brackets you define which dataset should be used. Here we want to get more details about the recently loaded precolonial.dta saved in the variable pre. Please insert the right command in the field below and click check.


! addonquizdim

The command returns two numbers. The first is the number of rows, the second describes the number of columns.

Let us get into more detail and print out some of the 522 rows of the dataset. The command head() will print out the first six rows of the dataset. Use the command on the dataset pre.


Every row describes several characteristics of an ethnic group. For example, the first row of the printed-out dataset contains information about the community of the Ababda.

! addonquizAbabda

Some of the other columns might be not as clear from the beginning because they need more interpretation. We will go through them systematically in the following exercises and in doing so we learn what the dataset exactly is about. If you want to get an overview of all variables contained in the dataset combined with a short description, you can always open the Data Explorer by clicking on the tab data in the heading of each task.

This exercise refers to page 1 - 4 of the paper.

Exercise 2 -- Introduction of the TseTse suitability index: Laboratory experiments and empirical framework

To analyze the effect of the TseTse fly on African development we first have to find out how the TseTse fly was distributed in precolonial Africa.

General information

Marcella Alsan the author of the paper which provides the base of the problem set developed the TSI which is short for TseTse suitability index. The TSI measures the distribution of the TseTse fly with help of climate data from precolonial Africa. The index is developed by using controlled laboratory experiments. Temperature and humidity are the input variable to define a function for the TseTse birth and death rate. Birth and death rate are then combined to a function that describes the TseTse population depending on temperature and humidity.

info("TseTse physiology") # Run this line (Strg-Enter) to show info

In a last step the TseTse population function is combined with historical climate data and results in the TSI. The used climate data is collected by the National Oceanic and Atmospheric Administration's 20th Century Reanalysis. This reanalysis contains temperature and humidity data on a daily basis since 1871. The author combines these daily climate variables to develop the TSI. The big advantage of this index is that it can be considered exogenous. For example if we would use the cattle distribution, the exogeneity is not given. More about exogeneity in the info block.

info("Why we use method of potential to estimate the population of the TseTse fly?") # Run this line (Strg-Enter) to show info

Analyzing the TSI distribution

So much for the theory of the TSI. Now we want to analyze the index with statistical methods. Therefore, we load the dataset precolonial. This time it is your turn. In case you are not quite sure have a look at the exercise before.

# load the dataset precolonial.dta and assign it to the variable pre. The right command is read.dta("")

Our first dataset precolonial contains a TSI for every African ethnic group. First let us calculate the mean of the variable. The right R command is mean(). To address the variable TSI contained in the dataset pre we write pre$TSI. Please insert the code in the field below.


Second, we want to measure the spread of the TSI distribution. Please calculate the standard deviation with the command sd().


Now we know that the standard deviation of the TSI is about 1 with a mean of 0.

Through this basic calculation we got a first idea of the data. This is important so we can use the right statistical instruments later and interpret the results.

Density plot

To get a more detailed picture of the data we plot it and compare it with the standard normal distribution. Therefore, we use density-plots.

To compute the standard deviation we use the command rnorm() which generates random numbers. The first number passed to this command defines the amount of random numbers. In order to create the standard normal distribution plug the corresponding mean and standard distribution into the function rnorm().

To solve the task you first have to remove the # and then replace the ??? with the right code elements.

# Replace the ??? in the code below and uncomment the command.
# with help of the function rnorm() generate 100.000 random numbers from a normal distribution and saves it in the variable x

# x <- rnorm(100000, mean = ?, sd = ?)

Now we have a variable called x which approximates a normal distribution.

Your second task is to create a density plot of the TSI. In the first row the command to plot the normal distribution is already given you can adapt this to create your own code.

Like in the task before just remove the # and replace the ??? with the right code elements.

# plot(density(x), col = "red", main = "Density plot comparison: Standard normal distribution and TSI") 

# print a green plot of the TSI
# lines(???????(pre$???) , ??? = "green")

Standardization

Why does the distribution look like this?

The TSI is a standardized value called z-score of the steady state population. Every observation of the TSI is subtracted by the expected value of TSI and afterwards divided by the standard deviation. The result is a standardized random variable with a mean of zero and a standard deviation of one. The formula looks like this:

$$z_i = \frac{TSI_i - \overline{TSI}} {sd_{TSI}}$$

What are the advantages of a standardized value when it comes to analyzing the data?
One benefit is that the standardization makes it easier for us to interpret the regression because a change by one unit equals the standard deviation. So we have the comparison to the entire population instead of just an absolute number which often matches our point of interest. Also, we can compare the coefficients of several regression easier. An additional advantage of the standardization is that we see at a glance if a value is above or below average.
The standardization does not influence the statistical significance of the performed analysis. (Wooldridge 2013, p. 187-189, 852; Auer 2015, p. 54-56, 217-218)

Now we know more about the variable TSI which we will use in most exercise of this problem set.

This exercise refers to page 8-9 of the paper and appendix C.

Exercise 3 -- Visual comparison of the suitability for TseTse with the suitability for rainfed agriculture in Africa

Distribution of TSI over Africa

After analyzing the TSI we want to see how it was distributed over historical Africa.

! addonquizTseTse distribution

We aim to answer this question with a plot showing Africa together with the TseTse distribution. Therefore, we use the package ggmap.

info("ggmap") # Run this line (Strg-Enter) to show info

Unfortunately, the code is very slow. That is why I already run the code, plotted the map, and saved it in the file called africamap_TSI. If you want to see the code which prints the map, please open the note block below.

! start_note "How to plot a map of Africa with the TSI distribution"

This is the code which creates a map of Africa combined with the TSI. Please do not run the code because it is quite slow.

# loading the data
pre = read.dta("precolonial.dta")

# loading the packages
library(ggmap)
library(ggplot2)

# building the map
pre$latlon = paste0(pre$lat , ":" , pre$lon)
pre$const = 1

loc = c(min(pre$lon) * 1.1 , max(pre$lat) * 0.9 , max(pre$lon) * 0.9 , min(pre$lat) * 1.1)
map <- get_map(location = loc, zoom = 3)
map <- get_map(location = 'Africa', zoom = 3)

mp <- ggmap(map) + geom_point(aes(x = lon , y = lat , color = TSI) , data = pre , alpha = .5 ,
                              size = 5) + scale_color_gradientn(colors = c("red" , "blue"))

# saving the plot
# saveRDS(mp , file = "africamap_TSI")

! end_note

Now we load the map saved in the file africamap_TSI and print it out.

mp = readRDS("africamap_TSI")
mp

The graphic shows a map of Africa joined with the TSI. On this map we ordered the ethnic groups following their historical place of residence. For every ethnic group our dataset contains a value describing the TseTse suitability. The colored circles range from blue to red and describe if the region has a high or low suitability for the fly.

! addonquizAfrica and TSI

Now we know more about the distribution of TSI within Africa. In the next step, we want to compare it with the variable SI.

Distribution of SI over Africa

SI is the abbreviation for FAO's agricultural suitability index. It measures the suitability of a region for rainfed farming. The index is normalized and ranges from 0 to 1. Therefor the specific conditions of climate, soil, and terrain which influence the farming output are analyzed. Then the index is developed by comparing this data with the specific circumstances of the regions. A higher value means that the area the group lived in was very suitable for agriculture.

info("FAO") # Run this line (Strg-Enter) to show info

! addonquizSI distribution

Let us now test your assumption and print out a map of Africa joint with SI. Like before you can have a look at the note-block to see how the map was exactly calculated or just continue to the task where we load the prepared plot.

! start_note "How to plot a map of Africa with the SI distribution"

Below you find the code which creates a map of Africa combined with the SI. Please do not run the code because it is quite slow.

mp2 <- ggmap(map) + geom_point(aes(x = lon, y = lat , color = SI) , data = pre , alpha = .5 ,
                              size = 5) + scale_color_gradientn(colors = c("red" , "blue"))

# saving the plot
# saveRDS(mp2 , file = "africamap_SI")

! end_note

This time it is your turn to load the plot. The map is saved in a file called africamap_SI. Please save the loaded map in a variable called mp2.
If you experience difficulties, just have a look at the previous tasks and adopt the code.


Now we want to compare the two plots. Please print out the two maps: mp and mp2.


! addonquizSI and TSI

The aim of this task was not to give evidence of correlation between SI and TSI. We just wanted to get a first graphical impression if the TseTse was mainly prevalent in fertile regions. The result is that it seems like most regions in the dataset are both, suitable for TSI and agriculture or the opposite. But this is only an initial assessment based on our observations. In the next two chapters, we will use regressions to find out more about the correlation between TSI and selected development variables.

This exercise refers to figure 3 of the paper.

Exercise 4.1 -- Regression: Correlation between subsistence strategies and the TSI: Linear and Multiple regression

In the previous exercises, we learned more about the dataset in general and the variable TSI. In this section, we want to find out if there is a correlation between the subsistence pattern of an historical group and the TseTse fly.

Theoretical background

But why do we want to find out more about the subsistence strategy? How does it help us to explain the precolonial development?

The subsistence strategy of a group affects the group size and the structure of a social group. The economic outcome of hunting is different to agriculture or husbandry and this affects the amount of people that can live together as a group. The strategy used by the group also influences the social structure and migratory patterns. A group that relies on intensive agriculture can cultivate a place several years, whereas a group that relies on hunting must follow the wildlife. Hence if the TSI has an impact on the selected subsistence strategy of a group, it influences the group's development.

So much for the theoretical background. Now let us load the data and start regressing.

# loading the data: 
pre = read.dta("precolonial.dta")

Linear Regression

Structure of the variable

For every row and consequently for every group the dataset contains five values which describe the used food production system. The names of the columns are gathering, hunting, fishing, husbandry, and agriculture. The variables are categorical and range from 0 to 9. A high value codes high dependence a low number codes that this strategy was not important to feed the group members. 0 equals a dependence of 0-5 % and means that the group did not or little rely on this subsistence strategy. A value of 9 describes a high dependence ranging from 86-100 %. For the values in between the author does not give a direct conversion. This makes it hard to interpret the regression coefficients.

In the first step, we choose the subsistence strategy husbandry and analyze how it varies with chances in TSI.

! addonquizhusbandry

So, the variable husbandry describes how much a group relied on livestock farming.

Linear Regression

In the following we want to calculate a so called OLS regression.

info("OLS regression") # Run this line (Strg-Enter) to show info

The regression formula we use in the following:

$$Husbandry_j = \alpha + \beta TSI_j + \epsilon$$

The index $_j$ identifies one of the 522 ethnic groups contained in the dataset. Remember, each row of our dataset describes another ethnic group inside Africa. $\epsilon$ is the error term it contains all unobserved factors that effects the probability that a group relied on husbandry beside the TseTse fly (Wooldridge 2013, p. 21).

The R command we use here is lm() which stands for "linear model". For more information right click here and open a new tab. We pass the function the dependent variable husbandry and the independent variable TSI separated by ~. The argument data specifies the dataset the previous variables come from. Alternatively, we could also address the variables with pre$... . With the command summary(name of the regression) we print out the regression result.

To solve the task you first have to remove the # and then replace the ??? with the right code elements.

# computing regression
# linreg_husbandry = ??(???????? ~ ???, data=pre)

# printing out the regression coefficients
# ???????(linreg_husbandry)

Interpretation of the regression output:

The output tells us the linear model looks like this:

$$\widehat{Husbandry_j} = 2.39543 - 0.81172 * TSI_j$$

The important value here is the estimated value of $\beta$ which is roughly - 0.81. We can interpret the regression coefficient here as followed: A one standard deviation growth (remember: The TSI is standardized so the standard deviation is one.) in the TSI variable decreases the probability that an ethnic group relies on husbandry by nearly one category.

info("Interpretation of regressions with ordinary variables") # Run this line (Strg-Enter) to show info

info("Correlation vs. causality") # Run this line (Strg-Enter) to show info

After interpreting the coefficients, we now want to have a closer look at the other statistical values returned by the summary command. The ** behind the regression result tells us that it is significant at the 1 percent level. The p-value in this regression is $2.210^{16}$ which means very small.

info("Significance level and p value") # Run this line (Strg-Enter) to show info

Scatterplot

As a next step, we want to plot our regression results with a scatterplot. The command we use is plot(). The first variable we pass the command will be plotted on the x axis, the second one on the y axis. We analyze the effect of TSI on husbandry, so which variable belongs to which axis? Pass the variables to the right axis in the code chunk below.
Also we plot the fitted line suggested by the OLS estimate in red color. The right command here is abline(name of the regression, col = " ").

# plotting the data
# plot(pre$???, pre$???, main = "Scatterplot TSI and husbandry")
# abline(linreg_husbandry, col="red")

So how to interpret the scatter plot?
Each dot stands for one ethnic group. The position of the dots describes the dependence on husbandry and the TseTse suitability. The x axis refers to the TSI so a dot that is far on the right side describes a group living in an area with high TseTse suitability. The y axis is related to husbandry this means a group which relies to a big part on husbandry as a subsistence strategy is described with a high dot. Beside the dots there is also a line which codes the correlation between husbandry and TSI we calculated in the regression before. It is a falling line, because we found a negative $\beta$.

So how can we explain the negative correlation between TSI and husbandry?
The TseTse fly transmits the sleeping sickness to the livestock of a group. Hence in areas with a high suitability for the fly there is a higher chance that farm animals get infected with the sleeping sickness and die. Consequently, animal husbandry is not an effective way to feed the group and will not be chosen.

But we should be careful with interpreting the results of the simple linear regression. We do not know if the error term $\epsilon$ contains any relevant variables that influence the outcome and are correlated with the TSI. These are so called omitted variables and they would bias our estimate. More details in the info box below. We will take this into consideration in the following exercise and compute a so called multiple regression.

info("Error term") # Run this line (Strg-Enter) to show info

Multiple Regression

To find out more about multiple regressions in general, please open the info block.

info("Multiple Regression") # Run this line (Strg-Enter) to show info

What other factors can we think of that might influence the dependency on husbandry? For example, we can think of climate factors like temperature, humidity, or the access to a river.

Control variables

What do we want to measure with our regression?
The effect of TSI on husbandry.

Below we see the relationship between TSI and husbandry together with the control variable prop_tropics which measures the proportion of land area in the tropics for each ethnic group. This graphic visualizes the characteristic of a control variable.

Figure 2: Arrow diagram - Relationship between TSI and husbandry together with a control variable,
Source: own diagram

A short explanation to the figure:

The boxes stand for the variables in the regression the arrows represent the effect one variable has on another. Remember, the aim of our regression is to measure the effect TSI has on husbandry. But if we just compute a linear regression between these two, we will ignore the effect the tropics - measured by the variable prop_tropics - has on both variables, TSI and husbandry. This is called an omitted variable bias. To avoid the bias, we include prop_tropics as a control variable. By doing so we detangle the effects the tropical conditions have on the regression and can separately measure the effect of prop_tropics on husbandry and even more important the effect of TSI on husbandry.

info("Omitted variable bias") # Run this line (Strg-Enter) to show info

In the following we want to add some meaningful control variables and analyze how they affect the correlation between TSI and husbandry. To make it more interesting, you can have a guess before about how the regression coefficient will change.

! addonquizcorrelations

To test your answer, let us compute the correlations. Therefor just click edit and check afterwards.

cor(pre$TSI, pre$prop_tropics)
cor(pre$husbandry, pre$prop_tropics)

We see that the TSI is positively correlated with the tropics. This result is obvious and easy to explain because the sleeping sickness is a tropical disease. The TSI is computed with climate data which model the suitability for the TseTse. The fly prefers high humidity and a constant temperature around 25 °C. These ideal conditions match best the values found in the tropics. So if a high land ratio of the country lies in the tropics, there will be more Tsetse flies and a higher possibility of the sleeping sickness.

The negative correlation between husbandry and the tropics is not as easy to explain. We do not have enough background information to give a precise explanation why we observe this we can just guess. Maybe the areas in the tropics were not as suitable for husbandry because of the climate conditions. Or the groups living in the tropics relied mainly on other subsistence strategies like hunting because they were more effective. Another reason might be that other tropical animal diseases are prevalent.

! addonquizclimate control

Let us now test your answer and compute the regression with the proportion of land area in the tropics to check how the regression output changes.

reg_husbandry_cc = lm(husbandry ~ TSI + prop_tropics, data = pre)
summary(reg_husbandry_cc)

# Print out the regression coefficients of linear regression calculated in an earlier task to compare.
coef(linreg_husbandry)

If we control for the tropics, the effect of TSI on husbandry gets weeker (Do not get confused. The coefficient $\beta$ gets bigger/less negative). The coefficient changes because of the underlying correlations between the tropics and TSI respectively husbandry we discussed beforehand.

So much for the effect of the tropics. In a next step, we will add the dummy variable river which indicates if there was a river in the area of the ethnic population. Once again you can have a guess before we calculate the change in the regression coefficient after adding the new control variable.

info("Dummy variable") # Run this line (Strg-Enter) to show info

! addonquizriver control

Now it is your turn. Add the control variable river to the multiple regression.

# reg_husbandry_gc = lm(??? ~ ??? + ???_??? + ???, data = pre)
# summary(reg_husbandry_gc)

# to compare with the regression before, where prop_tropics was the only control variable
# coef(reg_husbandry_cc)

Like before when adding prop_tropics we do not know exactly why the correlation changes the way it does when adding the control variable. But let us think of a plausible connection.

The TseTse is dependent on access to water for living and reproduction (Laveissière et al. 2011). Because of that we observe a positive correlation between river and TSI.

Husbandry is also known to be a water intensive subsistence strategy. Consequently, our first assumption might be that river and husbandry are positive correlated. But in our case the variables river and husbandry are negatively correlated. This example should strengthen our awareness that in some cases relationships are not that easy to guess and it needs further research to find out the reason for the measured relationship. Maybe the groups near a river relied stronger on fishing and because of that we observe a negative bias and $\beta_1$ gets more negative.

In the following we want to include additional control variables. Therefore, we first have to discuss which variables are suggestive to add.

Channel and proxy variables

In this chapter, we aim to investigate the characteristics of proxy and channel variables and how to include them in a regression.

In our dataset sleeping sickness is a so-called channel variable. Because there is no data available on the historical prevalane of the sleeping sickness we cannot include it in our model. To still investigate the effect of the sleeping sickness the the author developed a so-called proxy variable - in our case TSI - instead. (For more information about the approach to create the TSI have a look at exercise 2). A proxy variable is correlated with the channel variable. Remember, in our case the TSI measures the distribution of the TseTse fly which acts as the vector for the sleeping sickness. Because of this natural symbiosis both variables are related. (Kennedy 2013, p. 3, 158; Wooldridge 2013, p. 298-299)

info("Differences malaria and sleeping sickness ") # Run this line (Strg-Enter) to show info

In the following we will ignore the fact that there is no variable which measures directly the distribution of sleeping sickness and through this learn more about how to include channel variables in a regression.

! addonquizincluding sleeping sickness

Let us explain this in more detail with the graphic below which describes the relationship between TSI and the channel variable sleeping sickness together with the control variables.

Figure 3: Arrow diagram: Relationship between TSI and the channel variable sleeping sickness together with the control variables,
Source: own diagram

How to explain the figure above?

The regression measures the effect of TSI on husbandry. In the right corner the geographic control variables that influence all other variables are pictured. They are included in the regression to separately measure the effect TSI has on husbandry. TSI is the proxy variable to estimate the prevalence of the sleeping sickness. If we would have exact data on the historical distribution of sleeping sickness - what we do not have - we could use this to predict the development variables. But there is no point in including a control variable measuring the historical sleeping sickness in the regression. This would falsify the regression result between TSI and husbandry because the effect of the fly is through transmitting the sleeping sickness.

In contrast the variable malaria is not correlated with TSI because another fly is the vector for this disease. Malaria is related with the same geographical control variables as the sleeping sickness. The disease has no direct impact on livestock farming, because it does not infect cattle. (A detailed discussion of the control variables is given in the chapter below.)

Adding all meaningful control variables

After we discussed which variables to include as a control variable in the regression we now want to discuss the mathematic formula and give a short description of all used control variables.

The mathematic formula of the multiple regression:

$$Husbandry_j = \alpha + \delta TSI_j + X'_j + \epsilon_j$$

Most variables are equal to the linear regression we discussed before. The new term in this equation is $X'_j$. It contains plausibly exogenous control variables. The control variables we use here can be clustered in four groups:

In order to get a detailed explanation of every control variable have a look in the info box.

info("Description of the control variables") # Run this line (Strg-Enter) to show info

For the next step, I already added all remaining variables for climate, malaria, geographic and waterways. Just press check to compute the multiple regression and consequently show the coefficients.

reg_husbandry_c = lm(husbandry ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI , data = pre)

summary(reg_husbandry_c)

Now we see the whole picture. The first column lists all control variables. Accordingly, we see which variable is positively or negatively correlated with husbandry. The explanatory variable TSI is still significant and shows a negative effect on livestock farming. But be careful there are a lot more factors that influence husbandry. However, we cannot measure them or have no data. Hence the regression result is only an approximation and no exact value.
In the next exercise 4.2 we get to know methods to further optimize our regression.

Visualization of the regression results with effectplot()

At the end of this exercise we display the regression results. Therefor we use the function effectplot() from the package regtools (Kranz 2016). The command is helpful to visualize and compare the effects a normalized change in the independent variables has on the dependent variable. It tells us which explanatory variable shows a big impact on husbandry and which other control variables only show a small one.

Now it is your turn to apply the function effectplot() on the multiple regression. Please, do not forget to remove the # to load the package.

# library(regtools)

When looking at the effectplot we see on the left the names of the independent variables. The length and color of the bars tells us more about the size and direction of the effect on husbandry. We can see at one glance if the correlation is positive (blue) or negative (red). The explanatory variables are ordered in ascending order according to their effect size. For dummy variables like coast or river the numbers written in the vertical bars describe the effect from a change from 0 to 1. An exception is the dummy prop_tropics which changes from one to one, so the effect on husbandry is zero and we can not interpret it. We observe that TSI is in the middle of the effect sizes and has a negative impact. The variables controlling for climate show a huge correlation whereas for example the malaria index is negligible.

This exercise refers to page 6 - 7 and 14 - 15 of the paper.

Exercise 4.2 -- Correlation between subsistence strategies and the TSI: Clustered robust standard errors

In this chapter we want to work further on the multiple regression so that the applied method fits even better to the given data.

What are clustered standard errors? To find out, please open the info block below.

info("Clustered robust standard errors") # Run this line (Strg-Enter) to show info

First let us load the data.

pre = read.dta("precolonial.dta")

As a next step, we modify the standard errors of the regression. In the regression above we treated every group as an independent observation. But that is not the whole story. Groups that have a similar cultural ancestry correspond in used subsistence strategies. For example, nomadic groups will more likely rely on hunting and husbandry instead of agriculture. Like the Masai where husbandry = 9 and agriculture = 0. These groups developed technologies and habits through the years that will not change easily. Hence in the following we cluster the robust standard errors at the level of provinces.

info("Commands length() and unique()") # Run this line (Strg-Enter) to show info

So how many clusters are calculated? Run the code below and find out.

length(unique(pre$province))

The result is 44 clusters.

In the following tasks, we load a package called lfe. This package allows us to compute regressions with clustered standard errors very short and elegant. There are also many other possibilities to get the clustered standard errors like calculating a cluster-robust variance-covariance matrix and then perform a t-test of the estimated coefficients but the R code is a lot longer.

info("lfe package") # Run this line (Strg-Enter) to show info

In the following task, we use the function felm() of the above-mentioned package.

The command consists of 4 parts. In the first part, we fill in our regression formula. Part two is to define fixed effects. We will not use this now but in a later exercise. Part three is not relevant for us so we just write 0. Part four specifies the cluster for the standard errors.

Now it is your turn. Complete the regression equation with husbandry as the dependent variable and TSI as independent together with the control variable and the province clusters. Remember to remove the ###.

# loading the package
# library(lfe)

# computing the regression
# reg_husbandry_clus = felm(??? ~ ??? + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | ??? | ??? | ??? , data = pre)

# printing out the regression
# summary(reg_husbandry_clus)

We see that the standard errors got larger when clustering the dataset instead of just using the usual OLS standard errors. This occurs because the errors are positively serially correlated. So, the real uncertainty of the OLS standard errors is underestimated by the parameter estimates. (Wooldridge 2013, p. 419, 425)

Analyzing all Subsistence strategies

Of course there are more subsistence strategies then just animal husbandry. To get the whole picture we analyze also the effect of TseTse on hunting, gathering, agriculture and fishing. The formula is as followed:

Regression equation (1):

$$Outcome_j = \alpha + \delta TSI_j + X'_j \Omega + \epsilon_j$$

The dependent variable $Outcome_j$ is one of the subsistence strategies we want to relate to the TSI. Once again we use a same multiple regression with clustered standard errors like before.

reg_hunting_clus = felm(hunting ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre)

reg_gathering_clus = felm(gathering ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data=pre)

reg_agriculture_clus = felm(agriculture ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data=pre)

reg_fishing_clus = felm(fishing ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data=pre)

After we computed the regression we want to print out the results. For presenting the results we use the function stargazer from the same named package. First, we pass the regressions calculated above to the function. Second, we define the type as html which determines the type of produced output. In a last step we specify the variable which determines the heading of the output.

! addonquizsubsistence strategies

Let us test your answer and print out the regression coefficents. Just click check.

library(stargazer)

stargazer(reg_husbandry_clus , reg_hunting_clus , reg_gathering_clus , reg_agriculture_clus , reg_fishing_clus , type = "html", title = "Relationship between TSI and subsistence patterns" , column.sep.width = "10pt")

Interpretation of the regression output

Let us describe and analyze the regression results now in detail.

The regressions indicate that the TseTse fly had a significant impact on some of the food production strategies. We observe that a one standard deviation raise in TSI is related to a statistically significant increase in hunting and gathering and a decline in husbandry. The author suggests that hunting and gathering are both food production technics which complement each other. Ethnic groups with a high TseTse suitability index relied on hunting and gathering since both are easy to combine because they regard the same spatial flexibility. The negative effect on husbandry can be explained because the TseTse bites mostly animals. Livestock has a higher risk to get infected then wildlife and so husbandry in this region was not very effective.

We do not find a significant correlation between TSI and agriculture. The author assumes that the TSI influenced mainly the way groups farmed. She states that groups with high TSI values relied on forms of slash and burn agriculture whereas groups outside of the TSI infected areas did intensive farming. In exercise 5 we will look at development variables in detail and get a better understanding of how TseTse influenced the way a group performed agriculture.

Fishing is also not correlated with TSI. For fishing a group needs access to the sea, a lake, or a river. Hence the access to waters and not the TseTse defines if a group can perform fishing. It is reassuring that we find a significant correlation of fishing with coast and river in the regression output.

The Influence of Malaria on the subsistence strategy

The author repeats the regressions with the malaria index. So, we can compare the correlation between malaria and the selected food production strategy with the significant results we found for the TSI.

info("Malaria ecological index") # Run this line (Strg-Enter) to show info

! addonquizmalaria impact

To test if your answer is correct, we compute the multiple regressions with malaria as a dependent variable and the subsistence strategies as independent variables.

reg_husbandry_m = felm(husbandry ~ malaria + prop_tropics + meantemp + meanrh + itx + TSI + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data=pre)

reg_hunting_m = felm(hunting ~ malaria + prop_tropics + meantemp + meanrh + itx + TSI + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data=pre)

reg_gathering_m = felm(gathering ~ malaria + prop_tropics + meantemp + meanrh + itx + TSI + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data=pre)

reg_fishing_m = felm(fishing ~ malaria + prop_tropics + meantemp + meanrh + itx + TSI + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data=pre)

reg_agriculture_m = felm(agriculture ~ malaria + prop_tropics + meantemp + meanrh + itx + TSI + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data=pre)

Printing the results. This time it is your turn to nicely display the regression results with stargazer. If you get confused, just adapt the code from the task before.

# ???(reg_husbandry_m , reg_h??_m , reg_g??_m , reg_f???_m , reg_a???_m , type = "html" , title = "Relationship between malaria and subsistence patterns")

This fits to our theory that the Tsetse played a special role in the African development. Both are tropical diseases transmitted by flies, but malaria in contract to the sleeping sickness did not infect the livestock as much.

This exercise refers to page 6 - 7 and 14 - 15 of the paper.

Exercise 5 -- Regression: Correlation between development variables and the TSI

In this exercise, we want to find out: Is there a connection between the TseTse population and variables measuring indicators of development? In the last exercise, we got an overview of the impact the TSI had on overall substance strategies. Now we want to go down one level and analyze special variables influencing the historical development.

pre = read.dta("precolonial.dta")

We will discuss the development variables one after another when we interpret them. Nevertheless, the info box contains a short overview of all new variables.

info("Relevant variables") # Run this line (Strg-Enter) to show info

Multiple regression with clustered robust standard errors

To analyze the correlation we compute multiple regressions with the TSI as independent variable and the development variables one after another as dependent variable. Like before we use robust clustered standard errors and the control variable discussed in the previous chapter: climate, malaria, waterway and geography.

Before we compute the regressions have a guess about selected correlations.

! addonquizlarge domesticated animals

! addonquizIntensive agriculture

! addonquizPlow

To test your assumption we calculate the multiple regressions with TSI as a dependent variable and the development indicators as independent variables.

reg_animals = felm(animals ~ TSI + prop_tropics + meantemp + meanrh +itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre)

reg_intensive = felm(intensive ~ TSI + prop_tropics + meantemp + meanrh +itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre)

reg_plow = felm(plow ~ TSI + prop_tropics + meantemp + meanrh +itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre)

reg_female = felm(female_ag~TSI + prop_tropics + meantemp + meanrh +itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre)

reg_popd = felm(ln_popd ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria+ coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre)

reg_slavery = felm(slavery ~ TSI + prop_tropics + meantemp + meanrh +itx + malaria + coast + river + lon + abslat + meanalt +SI | 0 | 0 | province , data = pre)

reg_central = felm(central ~ TSI + prop_tropics + meantemp + meanrh +itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre)

After we computed the regressions we want to print out the results. Once again we use the package stargazer.

stargazer(reg_animals , reg_intensive , reg_plow , reg_female , reg_popd , reg_slavery , reg_central, type = "html" , title = "Relationship between historical African development and TseTse suitability")

Interpretation of the regressions

Now let us go over the regression coefficients one after another, describe and interpret the coefficients. Therefor we concentrate on the first row which refers to the TSI.

A raise of one standard deviation in the TSI is related to a statistically significant fall of 23.1 % in the probability that the community kept large domesticated animals. Animals are infected with the sleeping sickness on a higher level than humans because of two reasons. First the fly prefers to sting animals over humans. Secondly there are more forms of the Trypanosomiasis that infect animals then humans.

To make this percentage points comparable we calculate the mean() of variable animals and compare this with the regression coefficient.

The dataset contains rows where the content is NA this means for this ethnic group no value is available. To calculate the mean we have to omit these variables first. The right R command here is na.omit(). Insert the correct code below.


! addonquizpercentage

This is a quite high value and suggests that there is a big impact of TseTse on animal husbandry.

The next variable measures if the group farmed intensively. It is also a dummy variable. 1 describes groups that farmed intensively 0 stands for shifting agriculture or no agriculture at all. We observe a negative correlation between intensive agriculture and the TSI. A one standard deviation raise in the TseTse suitability index diminish the probability that they performed intensive agriculture by 9 % which is roughly one-third of the sample mean.

The variable intensive agriculture is argumentatively connected to the possession of large animals. A group with domesticated animals can use them to drag the plow. Also, the animal dung would be used as fertilizer. The access to fertilizer is important for repeated cultivation because otherwise the soil gets exhausted after several years and through the lack of nutrients farming is no longer possible. In this case it would be necessary to leave the soil fallow for several harvests and shift the agriculture during this time to another area. A second argument why the TseTse hindered intensive agriculture is that shifting agriculture is less labor-intensive then farming the same area several times. Hence it is also easier to do shifting agriculture if there are less large animals which power can be used.

To measure historical population density, we use the data from Murdock's map (1959). The variable which measures the inhabitants per square kilometer is log-transformed and negatively correlated with the TSI. But how to exactly interpret the effect of TSI on $log(population density)$? To get an overview about how to interpret regressions with logs, have a look at the info box below. We observe the first scenario described in the info-block. So, a one standard deviation raise in TseTse suitability is related to a statistically significant decrease of approximately 75 % in population density. But in this case, we must be careful with the interpretation because it is a big change and so the value can only be viewed as a rough estimation.

info("Interpreting regressions with log") # Run this line (Strg-Enter) to show info

The next analyzed variable is female participation in agriculture. The variable is also a dummy variable that is 1 if women did most of the agricultural tasks and 0 if not.

! addonquizfemale participation

The author explains this connection with help of the theories developed by Boserup. In the regressions before we saw that the TseTse is negatively correlated with plow use and population density. Following the theory developed by Boserup this together with an easy access to land caused a division of labor. So, there are special tasks done by men like clearing the land and plowing and tasks done by women like caring for the subsistence crops. The low population density makes it necessary for both genders to participate in agriculture. (Beneria and Sen 1981) The researchers Alesina, Giuliano and Nunn (2013) even found out that there is a strong positive correlation between historical plow use and uneven gender roles. Men have more power in their upper-body which is necessary to use the plow by hand or control the harnessed animals. Soil preparation has a proportion of one third on all performed tasks in agriculture. Hence men had a comparative advantage in societies that performed intensive instead of shifting agriculture. Whereas women specialized on tasks done at home which led to a gender division. These gender norms did not disappear fast with the invention of new technology or that most of the economy today is not within agriculture. Instead even nowadays we observe that different societies have different imaginations about the role of women. This is an indicator that the TseTse did even have an indirect impact on culture though pre-colonial agriculture practices.

The dummy variable central is a simplification of the variable "jurisdictional hierarchy beyond the local authority" from the Ethnographic Atlas written by Murdock (1967). 0 stands for groups who do not have a form of centralized state. 1 codes any other form like small chiefdoms, large and predominant chiefdoms, minor and large states.

! addonquizcentral

A raise of one standard deviation in the TSI has a negative effect of 7.5 percentage points on the possibility that an ethnic group was centralized. To find an explanation for this we can think about what conditions must be fulfilled that a chiefdom is build. According to Bairoch (1988) there must be an agricultural surplus and a transportation network. In the previous regression analysis we saw that intensive agriculture and TSI are negatively correlated this indicates that the group got a lower farming output. A transportation network is easier to build if the group possesses large animals like horses or camels. But large animals and TSI are negatively correlated like we showed in the previous regression. We can summarize that the lack of a good transportation network and no large agricultural surplus to feed a ruling class hindered the centralization in TseTse infected regions. This connection is an important finding because political centralization before the colonialization is positively correlated with the development in nowadays Africa. Another reason given by the authors is connected to the subsistence strategy. In the exercises before we found out that a high TSI is connected to a society relying on hunting and gathering. These forms of subsistence implicate that the group wandered without a permanent residence and all group members were involved in maintaining a livelihood. To avoid fights over material goods foraging groups separated in smaller subgroups without broad authority. This practice hindered centralization. (Gennaioli and Rainer 2007; Michalopoulos and Papaioannou 2013,2014)

The dummy variable indigenous slavery is coded 1 for all forms of beginning or recorded slavery and slavery transmitted as a heritage to the next generation. 0 stands for no forms of slavery. We observe a positive correlation between TSI and slavery.

! addonquizslavery

An explanation for this empirical result is delivered by Nieboer (2013) and Domar (1970). They discovered that a low population density was historically positive correlated with slavery. In the regression before we found out that TSI and population density are negatively correlated. Glasgow (1963) assumed that the TSI had an indirect effect on slavery through the lack of large domesticated animals. The TseTse hindered groups to possess draft or pack animals to transport goods. Through this the groups had to perform transportation and farming tasks by humans. The scientist conjectured that this lack boosted the expansion of slave labor. A similar aspect is also discovered by Bonnassie (2009, p. 40). He found out that in Western Europe the technical change reduced the use of slaves. Because of technical adaption the animals could be used more efficiently and slave slavery got in comparison less attractive.

Visualization of regression results

After we computed and interpreted the regression results we now want to plot some of them. The package we use therefor is called effectplot. It helps us to compare the effect a normalized change in the independent variables have on the dependent variable. We already used this in exercise 4.1. If you wish a detailed explanation please, have a look at the last task of this exercise. I selected the regressions which investigate the correlation of TSI with slavery and plow. It is your turn to write down the right code to display the two effectplots of the two regressions.


! addonquizeffectplot

Are the signs of the coefficients and the order as we expected?

For slavery we observe a high positive correlation with the variables TSI, absolute latitude and river. The correlation with the other variables - like climate variables or malaria - is a lot lower. We already discussed the reasons for the high correlation with TSI in the previous exercise. The correlation with river could be significant because slaves were used to transport goods. If the region has a river in its boundaries, this simplifies transport and slave labor might not be necessary any more.

For the variable plow we monitor a negative correlation with TSI like we discussed in previous exercise. But TSI is not the variable with the biggest magnitude. Climate variables show a higher magnitude but they are not significant, so we refuse to interpret them.

This exercise refers to page 7 - 8 and 10 - 14 of the paper.

Exercise 6 -- Placebo test: Correlation between TSI and development in the tropics outside Africa

Discussion of the new dataset placebo

Till now we focused on analyzing the effect of TSI on the development inside Africa. Now we want to take a step further and analyze the impact of TSI in the tropics outside of Africa. This broader view is necessary to make sure that the TSI is measuring the effect of the sleeping sickness on African development and not only the connection between climate factors and farming.

Therefore, we load a new dataset called placebo.

pla = read.dta("placebo.dta")

To get a first impression of the loaded data we print out some of the over 700 rows randomly. Therefore, we use the command sample_n(name of the data, sample size) contained in the package dplyr. The difference between sample_n() and the head() command we used in exercise 2 is that it does not necessary select the first six rows. This gives us a broader picture of the dataset especially if the data follows a specific order.
First, load the new package dplyr. Second, use the command to display 6 rows of the dataset placebo.


Some of the variables in this dataset are equivalent to the one from the previous dataset precolonial. The difference is that ethnic groups from outside Africa which lived in an area partly or completely inside the tropics are included whereas African groups which lived outside the tropics are removed. So, the data contains all groups in and outside Africa that lived entirely inside the Tropics of Capricorn and Cancer.

The dataset also includes new variables which start with the prefix africa. Through this the TSI and control variables for example for climate or malaria appear two times in the dataset. One time as a main effect and a second time with the prefix africa as the interaction between the control variable and the binary variable Africa corresponding to $I_j^{Africa} X'_j T$ in the regression formula below. $I_j^{Africa}$ is a dummy variable which equals 1 for groups inside Africa and 0 for ethnic populations outside Africa. Hence the whole term $I_j^{Africa} X'_j T$ is zero for ethnic groups outside of Africa. Inside Africa it equals the value of the corresponding variable we know from precolonial.dta.

Why the dataset is structured like this will get clear after we understand the concept behind the so-called placebo test.

Placebo test

Theoretical background

info("Placebo test") # Run this line (Strg-Enter) to show info

The following test is called placebo test because the groups which lived in the tropics but outside Africa act as a placebo group. The groups living in the tropics inside Africa are comparable to the group receiving the real treatment.

The structure of the placebo test is shown in the following regression formula:

Equation (2):

$$Outcome_j = \alpha + \beta TSI_j +\delta TSI_j I_j^{Africa} + X'_j \Sigma + I_j^{Africa} X'_j T + \gamma I_j^{Africa} + \epsilon_j$$

Remember:
$I_{j}^{Africa}$ is a binary variable that equals 1 if the ethnic group lived in Africa and 0 if not.
$X'_j$ contains the control variables for geography and climate. To understand equation (2) it is easiest to write it out for the groups inside and outside Africa:

The important thing to understand about this regression formula is that $X'_j$ appears two times in the regression. So, we allow the ethnic groups inside Africa to differ in more characteristics then just the TSI from the groups outside Africa.

With help of the placebo test we compare two regions which are similar in climate conditions because both lie in the tropics. But the TseTse only exists in Africa and because of that outside Africa there is no vector to transmit the sleeping sickness which keeps because of that restricted to Africa. Nevertheless, we apply the TseTse population model to the tropics outside of Africa to test our hypothesis that the sleeping sickness and not the underlying climate variables impacted the African development.

! addonquizexpectation

In the analysis before we clustered based on cultural provinces. But we do not have this information for ethnic groups outside of Africa so we use a broader category. The new cluster category for our standard errors is now language family which describes linguistic affiliation. Through this we control for cultural and geographical relatedness.

Regression outside Africa

Now we want to compute $\beta$ of regression (2) with robust standard errors clustered after language.

info("psdef=false") # Run this line (Strg-Enter) to show info

I already wrote all necessary commands you just have to click check.

reg_animals_out = felm(animals ~ TSI + meantemp + meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + africa_tsetse | 0 | 0 | language , data=pla, psdef = FALSE)

reg_plow_out = felm(plow ~ TSI + meantemp + meanrh + itx + abslat +lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + africa_tsetse | 0 | 0 | language, data=pla, psdef = FALSE)

reg_female_out = felm(female_ag ~ TSI + meantemp + meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + africa_tsetse | 0 | 0 | language, data=pla, psdef = FALSE)

reg_intensive_out = felm(intensive ~ TSI + meantemp + meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + africa_tsetse | 0 | 0 | language, data=pla, psdef = FALSE)

reg_slavery_out = felm(slavery ~ TSI + meantemp + meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + africa_tsetse | 0 | 0 | language, data=pla, psdef = FALSE)

reg_central_out = felm(central ~ TSI + meantemp + meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + africa_tsetse | 0 | 0 | language, data=pla, psdef = FALSE)

Now please print out the result you just need to press check.

stargazer(reg_animals_out , reg_intensive_out , reg_plow_out , reg_female_out , reg_slavery_out , reg_central_out , type = "html" , title = "Placebo Test: Main effect TSI (beta)")

Interpretation

The output describes the connection between TSI and development variables for groups living outside Africa. Looking at the regression coefficient $\beta$ printed out in the first row we see that except for plow use there are no stars displayed behind the coefficients which means they are no longer significant. Also, the coefficients are very small and have the opposite sign as we would expect trough logical considerations.

info("Why is plow use outside Arica significant?") # Run this line (Strg-Enter) to show info

Regression inside Africa

Now we want to calculate the correlation between TSI and the development variables for ethnic populations inside Africa. Therefore, we use the variable africa_tsetse which describes the interaction between TSI and the dummy variable africa.

reg_animals_in = felm(animals ~ africa_tsetse + meantemp + meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + TSI | 0 | 0 | language , data = pla, psdef = FALSE)

reg_plow_in = felm(plow~africa_tsetse + meantemp + meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + TSI | 0 | 0 | language , data = pla, psdef = FALSE)

reg_female_in = felm(female_ag~africa_tsetse + meantemp + meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + TSI | 0 | 0 | language , data = pla, psdef = FALSE)

reg_intensive_in = felm(intensive ~ africa_tsetse + meantemp + meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + TSI | 0 | 0 | language , data = pla, psdef = FALSE)

reg_slavery_in = felm(slavery ~ africa_tsetse  +meantemp+ meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + TSI | 0 | 0 | language, data=pla, psdef = FALSE)

reg_central_in= felm(central~africa_tsetse + meantemp + meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + TSI | 0 | 0 | language , data = pla, psdef = FALSE)

Before we print out the result have a guess:

! addonquizinside Africa

Printing out the results:

stargazer(reg_animals_in , reg_intensive_in , reg_plow_in , reg_female_in , reg_slavery_in , reg_central_in , type = "html", title = "Placebo Test: Africa interaction TSI (delta)")

Looking at the table we see that the coefficients are significant and the signs are as we expected.

At the end of this comparison between the groups inside and outside Africa we can summarize that the TSI does not only measure general patterns between climate factors and development.

This exercise refers to page 18 - 20 of the paper.

Exercise 7 -- Simulation of Africa without the TseTse and archeological evidence illustrated by the example of Great Zimbabwe

Loading results from previous exercise

Like at the beginning of the previous exercises we have to load the data.
Just press check.

pla = read.dta("placebo.dta")

The base of this exercise are the regression coefficients from the previous exercise. To work with the results, we have to run the regressions again. You do not have to type in anything just press check.

reg_animals_out = felm(animals ~ TSI + meantemp + meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + africa_tsetse | 0 | 0 | language , data = pla, psdef = FALSE)

reg_plow_out = felm(plow ~ TSI + meantemp + meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + africa_tsetse | 0 | 0 | language , data = pla, psdef = FALSE)

reg_female_out = felm(female_ag ~ TSI + meantemp + meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + africa_tsetse | 0 | 0 | language , data = pla, psdef = FALSE)

reg_intensive_out = felm(intensive ~ TSI + meantemp + meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + africa_tsetse | 0 | 0 | language , data = pla, psdef = FALSE)

reg_slavery_out = felm(slavery ~ TSI + meantemp + meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + africa_tsetse | 0 | 0 | language , data = pla, psdef = FALSE)

reg_central_out = felm(central ~ TSI + meantemp + meanrh + itx + abslat + lon + malaria + coast + river + meanalt + SI + africa + africa_rhum + africa_temp + africa_itx + africa_malaria + africa_SI + africa_alt + africa_coast + africa_abslat + africa_rivers + africa_lon + africa_tsetse | 0 | 0 | language , data = pla, psdef = FALSE)

! addonquizhypothesis

Our analysis support the assumption that the TseTse had a great impact on the development of historical Africa. This raises the question: How would have Africa evolved if the TseTse had not existed? Would it be more advanced?

Archaeological evidence: Great Zimbabwe

To approach this answer we go back in history and have a closer look at Great Zimbabwe. What makes this region interesting for us? Archaeologists discovered numerous deteriorate monuments which testify the political and economic importance of Zimbabwe in the past. The complex buildings were the largest south of the Sahara in the time before the colonialization. Also, Zimbabwe was free of the Tsetse because the geographical position on a plateau between two rivers created a natural protection against the fly. The TseTse can only exist in lower-lying areas (Ampim 2004). It is notable that the boundaries described by the ruins of Great Zimbabwe correspond with the boundaries of TseTse appearance determined by climate which we adopt from the research work of Rogers and Randolph (1986). Because of these observations we take Great Zimbabwe as an example to investigate, how Africa would have developed without the TseTse.

How did people lived in historical Zimbabwe?
There subsistence strategy was multifarious. During excavations archaeologists found skeletons of livestock and concluded that they relied on husbandry, mainly kept cattle. The inhabitants also grew cereals and traded in gold and ivory with countries as far as China and Arabia (Huffman 2009). If we compare these observations with the results gained in the previous regressions, we will recognize a big difference. The Tsetse hindered ethnic groups in the past to use a plow, posess large animals and perform intensive agriculture.

This far to the archaeological evidence. In the following we want to simulate on the base of the dataset placebo if we can find evidence in our dataset.

Simulation with a lower level of TSI

Prediction of the Baseline

We will work with the dataset placebo. If you want detailed information about the dataset, have a look at the previous task.

In a next step we prepare the data. Therefore, we use the command filter() to select all rows of the dataset where the dummy variable africa equals 1. In R an equation is written with two equal signs == . Have a look at the info box if you are not familiar with the new command.

info("filter()") # Run this line (Strg-Enter) to show info

Please generate a new variable called pla_africa which contains all rows in which the variable africa has the value 1. Use the command filter() to create a subset of the dataset pla.


Now the dataset only contains ethnic groups that lived in Africa and the tropics.

In a next step, we want to predict the development variables. Therefore, we use the regressions calculated above together with the function predict.felm() from the package regtools written by Kranz (2016).

We assign the prediction to a variable called v1_ followed by the name of the development variable we aim to predict like animals or intensive. Then predict.felm() is applied on the regression describing the connection between development indicators and the TSI for groups outside Africa. As a second argument we pass the manipulated dataset pla_africa. Out of this we then calculate the mean.
Just press check.

# Prediction
v1_animals = mean(predict.felm(reg_animals_out , newdata = pla_africa))
v1_plow = mean(predict.felm(reg_plow_out , newdata = pla_africa))
v1_female = mean(predict.felm(reg_female_out , newdata = pla_africa))
v1_intensive = mean(predict.felm(reg_intensive_out , newdata = pla_africa))
v1_slavery = mean(predict.felm(reg_slavery_out , newdata = pla_africa))
v1_central = mean(predict.felm(reg_central_out , newdata = pla_africa))

Presenting the results in a table, just press check:

Africa_Baseline_TseTse = round(c(v1_animals , v1_plow , v1_female , v1_intensive , v1_slavery , v1_central) , 2)

# defining table captions
development = c("Large domesticated animals" , "Plow use" , "Female participation in agriculture" , "intensive agriculture" , "Indigenous slavery" , "Centralization")

table1 = data.frame(development , Africa_Baseline_TseTse)

# printing out the table
table1

So this table shows the average values of the predicted outcomes for the development variables. We will use this as a baseline for a comparison with a simulation of Africa with a lower level of Tsetse. There is not a lot to say about this single table the meaning comes when we compare it with the simulation in the next task.

Prediction of the Simulation with a lower level of TSI

In the following we will simulate Africa with a lower burden of Tsetse. First, we have to manipulate the filtered data once again. We subtract one from the variable africa_tsetse for all observation. Because of the standardization this corresponds to a reduction of one standard deviation in the TSI. With help of this reduction we can analyze how the development variables chance with a lower level of TseTse transmitted diseases. The solution is already given just press check.

pla_v2 = pla_africa
pla_v2$africa_tsetse = pla_v2$africa_tsetse-1

Like before we predict the development variables, but this time with lower values for TSI.
Just click check.

# Africa Reduced TseTse
v2_animals = mean(predict.felm(reg_animals_out , newdata = pla_v2))
v2_plow = mean(predict.felm(reg_plow_out , newdata = pla_v2))
v2_female = mean(predict.felm(reg_female_out , newdata = pla_v2))
v2_intensive = mean(predict.felm(reg_intensive_out , newdata = pla_v2))
v2_slavery = mean(predict.felm(reg_slavery_out , newdata = pla_v2))
v2_central = mean(predict.felm(reg_central_out , newdata = pla_v2))

In the next step, we create the table. Click check to create the table.

Africa_Reduced_TseTse = round(c(v2_animals , v2_plow , v2_female , v2_intensive , v2_slavery , v2_central), 2)

table2 = data.frame(table1 , Africa_Reduced_TseTse)

Comparing simulation and baseline

Now we want to print out both tables and compare them. Please print out table2


What do we observe while comparing?
The values for keeping large animals, intensive agriculture, plow use and centralization increased. Whereas the predictions for female participation and slavery decreased. This fits to our hypothesis that groups with a lower TseTse burden developed on a higher level.

But we have to be careful with the interpretation. We cannot conclude based on the analysis that nowadays Africa would be more advanced because of the heritage from the past. There are endogenous responses like the colonialization which we did not consider. More information in the info block below.

info("Colonialization and the TseTse") # Run this line (Strg-Enter) to show info

This exercise refers to page 20 - 22 of the paper.

Exercise 8 -- Impact of the TseTse on modern African development

So much for the past. But what effect has the Tsetse on nowadays Africa? We are going to investigate the question in this chapter.
The challenge hereby is that the sleeping sickness impacts the political and economic structure in two ways. First, it shows a direct impact on health today because animals get ill. Second, it has an indirect effect on the development of institutions in the past and this results in a higher or respectively lower development level today. To estimate the historical impact of Tsetse on the development we have to detangle these two effects.

Extermination campaigns

The first approach that comes into mind is to investigate extermination campaigns. The idea behind this is to analyze the development in regions which are now free of the TseTse fly. Through this we can exclude the direct impact on health and would learn more about the historical effect.
But, there is no sizable eradication campaign that managed to create a TseTse-free area.

info("TseTse extermination campaigns") # Run this line (Strg-Enter) to show info

Climate change

The second approach is to search for areas which have been populated by the TseTse fly. But because of a change in temperature they are now no longer suitable for the fly. The change can also be vice versa so through climate change a region which has been TseTse free is now populated by the fly.

! addonquizTseTse temperature change

In order to find such climate changes we will have a higher change if we search at the geographic limits of the TseTse region. If we found such areas, we would perform a regression discontinuity study. The problem of this approach is a lack of data. We neither have detailed enough climate data nor observations of development variables over several years.

Discussion of the new dataset subnational

The data of nowadays Africa we have is saved in the dataset subnational.dta. Let us load it and shortly discuss the variables.

# loading the data
sub = read.dta("subnational.dta")

Once again we want to use the command sample_n to randomly select 6 rows of the new dataset. It is your turn to type in the right command.


Most variables are equivalent to the datasets before but calculated with nowadays data. Here is a short description of the new variables:

The variable frdn_central is calculated according to this formula:

$$Historical Centralization_{d,c} = \frac{\sum_{j} L_{j,d,c} * I_{j}}{L_{d,c}}$$

So the variable frdn_central is the population-weighted mean of a district's centralization before the colonialization.

One difference to the previously used datasets precolonial.dta and placebo.dta is that the tsi is calculated with modern climate data. The other one is that the dataset is ordered after districts and not ethnic groups.

Regression: Present economic outcome on TSI calculated with modern climate data

In the following we want to compute the regression using formula (1) to find out more about the relationship between two development indicators and the TSI. The indicators are luminosity and the number of cattle. Systematically we add control variables for climate and geography, and country fixed effects to the regression. We are particularly interested in what happens if we control for historical centralization measured by the variable frcn_central. Remember, in the exercises before we found a correlation between ancient TSI and historical centralization. Will the regression coefficient between modern TSI and luminosity or the number of cattle loses be significant, too?

Regression equation (1):

$$Outcome_j = \alpha + \delta TSI_j + X'_j \Omega + \epsilon_j$$

The variables we want to regress on are log(mean luminosity +0.01) and log(number of cattle +1).

Luminosity as a measure for economic outcome

info("Light density and relationship with development") # Run this line (Strg-Enter) to show info

In the analysis before we clustered based on cultural provinces. But we do not have this information for ethnic groups outside of Africa, so we use a broader category. The new cluster category for our standard errors is language family to control for cultural and geographical relatedness.

Additionally, we add country fixed effects to our regression as a proxy for nowadays differences in institutions and policies.

info("Fixed effects") # Run this line (Strg-Enter) to show info

So, in the following code we add the control clusters and fixed effects one after another. Just click check.

# added climate control and proportion of land area in the topics and malaria control
reg_light1 = felm(ln_lights ~ tsi + meantemp + meanrh + itx + prop_tropics + malaria | 0 | 0 | adm0_code , data=sub)

# added other geographic controls
reg_light2 = felm(ln_lights ~ tsi + meantemp + meanrh + itx + abslat + prop_tropics + malaria + near_inlandwater + coast + lon + meanalt + SI | 0 | 0 | adm0_code , data = sub)

# added country fixed effects
reg_light3 = felm(ln_lights ~ tsi + meantemp + meanrh + itx + abslat + prop_tropics + malaria + near_inlandwater + coast + lon + meanalt + SI| adm0_code | 0 | adm0_code , data = sub)

# added control for historical centralization
reg_light4 = felm(ln_lights ~ tsi + meantemp + meanrh + itx + abslat + prop_tropics + malaria + near_inlandwater + coast + lon + meanalt + SI + frcn_central | adm0_code | 0 | adm0_code , data = sub)

# regression of centralization on livestock
reg_central5 = felm(ln_lights ~ frcn_central + meantemp + meanrh + itx + abslat + prop_tropics + malaria + near_inlandwater + coast + lon + meanalt + SI + frcn_central | adm0_code | 0 | adm0_code , data = sub)

Printing out the result. Just press check.

stargazer(reg_light1 , reg_light2 , reg_light3 , reg_light4 , reg_central5 , type = "html" , title = "Relationship between modern economic development (log mean luminosity) and the TseTse suitability")

Interpretation of the regression output

Step by step we added control variables and fixed effects to the regression. Now we want to analyze how the regression coefficient changes. To make it more interesting it is designed intuitively and you work out the answers in solving the following quizzes.

Let us have a look at the first regression we performed.

! addonquizinterpretation luminosity

Note:
In column three we included geographic control variables like absolue latitude. Also, we include control variables called coast and near_inlandwater. This is very important when working with luminosity data. Light is reflected by water; this effect is called blooming. Imagine yourself standing next to a lake at full moon with no clouds while looking at the lake. You will see the reflection of nearby lights, moon and stars. Having this picture in mind we can imagine that regions near the water show higher luminosity values no matter of their stage of development.

We observe in column two that the correlation between luminosity and tsi became stronger and the significance level increased.

In the next column, we add fixed effect for the countries saved in the variable adm0_code.

! addonquiztsi coefficient significance level

In column 4 we control for historical centralization with adding frcn_central to our regression.

! addonquizhistorical centralization

This finding is consistent with the research of other scientists (Michalopoulos & and Papioannou 2013, 2014).

So, what is the effect on TSI of modern economic development?
We assume on base of the calculated regressions that nowadays there is no direct impact of TseTse on African development. In previous regressions, we found out that there is a correlation between historical centralization and the TSI and between historical centralization and nowadays development. If you want more details, have a look at the info box. We conclude that the effect of TSI on nowadays economic performance is not directly but indirectly over historical institutions.

info("Relationship between historical centralization and nowadays economic development") # Run this line (Strg-Enter) to show info

Number of cattle as a measure for animal husbandry

The author runs the regression also for tsi on the log(number of cattle) in this district. Compute the regression by clicking check and have a look at the results.

# added climate control and proportion of land area in the topics and malaria control
reg_cattle1 = felm(ln_livestock ~ tsi + meantemp + meanrh + itx + prop_tropics + malaria | 0 | 0 | adm0_code , data = sub)

# added other geographic controls
reg_cattle2 = felm(ln_livestock ~ tsi + meantemp + meanrh + itx + abslat + prop_tropics + malaria + near_inlandwater + coast + lon + meanalt + SI | 0 | 0 | adm0_code , data = sub)

# added country fixed effects
reg_cattle3 = felm(ln_livestock ~ tsi + meantemp + meanrh + itx + abslat + prop_tropics + malaria + near_inlandwater + coast + lon + meanalt + SI | 0 | 0 | adm0_code , data = sub)

# added control for historical centralization
reg_cattle4 = felm(ln_livestock ~ tsi + meantemp + meanrh + itx + abslat + prop_tropics + malaria + near_inlandwater + coast + lon + meanalt + SI + frcn_central | adm0_code | 0 | adm0_code , data = sub)

# regression of centralization on livestock
reg_central4 = felm(ln_livestock ~ frcn_central + meantemp + meanrh + itx + abslat + prop_tropics + malaria + near_inlandwater + coast + lon + meanalt + SI + frcn_central | adm0_code | 0 | adm0_code , data = sub)

Printing out the result:

stargazer(reg_cattle1 , reg_cattle2 , reg_cattle3 , reg_cattle4 , reg_central4, type = "html" , title = "Relationship between modern economic development (log number of cattle) and the TseTse suitability")

This development indicator shows a small decrease of significance while adding additionally control variables and fixed effects. But in the end it stays significant on the 10 % level. The difference is that we do not find a correlation between the number of cattle and historical centralization. You can see that the coefficient printed in row frcn_central, column 5 in the stargazer output above has no stars. The findings indicate that animal husbandry in nowadays Africa is still held up by the TseTse.

This exercise refers to page 22 - 25 of the paper.

Exercise 9 -- Robustness tests

In the following sections we want to think about interferences that can endanger our empirical results. We will go through possible threats for validity one after another and discuss the relevance. Therefor we use the precolonial dataset. Once again the first step is to load the data.

pre = read.dta("precolonial.dta")

Climate factors

The biggest concern is that our regression results do not (only) show the effect of the TseTse on development. Instead climate factors like humidity and temperature have a causal effect on the African development variables and we measure this effect. If this holds true, climate factors and not the TSI are the reason that some ethnic groups did not developed settlements, used the plough, or kept farm animals.

Correlation between agricultural suitability and TSI

To get an impression of the correlation between agricultural suitability and the TSI we print a scatterplot. If you want to find out more about the variable SI describing agricultural suitability, just open the info block.

info("Variable SI, measuring agricultural suitability") # Run this line (Strg-Enter) to show info

For the plotting, we will use the package ggplot2. To find out more about the package read the info section below.

info("ggplot2") # Run this line (Strg-Enter) to show info

To analyze the correlation between TSI and agriculture graphically we print out a scatterplot. Therefore, we assign the scatterplot to a variable called scatter. The used command is ggplot(data, aes(x=variable on the x axis, y=variable on the x axis)). With the command geom_point we define the shape of the dots. In our case, we want to print out hallow circles. With position_jitter we prevent overlapping. In our plot, we have many dots lying over each other so in some case we only see one dot, but actually there are several lying behind which we cannot see. So, we jitter the points which means adding random noise to make it easier to read the plot. position_jitter(width = 0.1, height = 0.1) is the right command here. Width respectively height defines the amount the plots are randomly shifted in horizontal respectively vertical direction. With the command labs(title="", x="", y="") we add a heading and name the x and y axis.

Just press check.

# assign the scatterplot to the variable scatter
scatter = ggplot(pre , aes(x = TSI, y = SI)) +
  geom_point(shape = 1, position = position_jitter(width = 0.1 , height = 0.1)) +    
  geom_smooth(method = lm) +
  labs(title = "Scatterplot: agricultural suitability vs. TSI", x = "TseTse suitability index" , y = "Suitability for rainfed agriculture")

Please print out the plot.


How to interpret the plot?
We have 3 elements: dots, a blue regression line and the grey shaded confidence area. The confidence interval of 95 % is included by default. Every observation of the dataset corresponds to a dot in the scatterplot. The regression line is chosen so that it fits best between the dots which means that the distances are as small as possible.
The regression line has a positive slope which tells us that TSI and SI are positively correlated. In exercise four we made the same assumption based on TSI and SI plotted on a map of Africa. This reassures us that our regressions do not only measure the negative effects of climate. Regions that fit for the fly are also fertile above average.

Alternative TseTse indices

In the following robustness tests we focus on the way the TSI is calculated, choose different approaches and analyze the results.

Minor Perturbation of TSI

The idea behind this robustness test is to manipulate the TseTse birth rate raised by the author in laboratory experiments slightly and run the regressions analyzing main development variables again. In the following we will alternate the TSI with a shifted version called pertube_TSI1 for a left deviation and pertube_TSI2 for a transformation to the right. Compared to the real observations gained in the laboratory the birth and death rate of the fly is changed slightly. The birth rate is deviated one standard deviation in both directions. The death rate includes a critical threshold which marks the temperature at which the fly falls into a chill coma. This critical value is raised by one standard deviation which corresponds to about 3 °C.

info("Chill coma") # Run this line (Strg-Enter) to show info

First, we want to print out the real TSI and the shifted TSI indices to get an overview. The argument alpha defines the transparence of the filling color. Please press check to run the code.

ggplot(pre) +
  geom_histogram(aes(pre$TSI) , fill = "black" , alpha = 0.2)  +
  geom_histogram(aes(pre$perturb_TSI1) ,  fill = "red" , alpha = 0.2) +
  geom_histogram(aes(pre$perturb_TSI2) , fill = "blue" , alpha = 0.2) +
  labs(title = "Histogramm of TSI, the left and right perturbation")

The histogram of the normal TSI is grey, the left perturbation light red and the right perturbation light blue. The mean is always 0 because all three indices are normalized.

After getting a feeling for the perturbation we run the regressions with the TSI modified to the left side. Please click check.

reg_animals_lshift = felm(animals ~ perturb_TSI1 + prop_tropics + meantemp + meanrh +itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_intensive_lshift = felm(intensive ~ perturb_TSI1 + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_plow_lshift = felm(plow ~ perturb_TSI1 + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_female_lshift = felm(female_ag ~ perturb_TSI1 + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_popd_lshift = felm(ln_popd ~ perturb_TSI1 + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_slavery_lshift = felm(slavery~perturb_TSI1 + prop_tropics + meantemp + meanrh +itx + malaria + coast + river + lon + abslat + meanalt +SI | 0 | 0 | province , data=pre , psdef = FALSE)

reg_central_lshift = felm(central ~ perturb_TSI1 + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

Let us show the results. Therefore, we once again use the package stargazer. Just press check.

stargazer(reg_animals_lshift , reg_intensive_lshift , reg_plow_lshift , reg_female_lshift , reg_popd_lshift , reg_slavery_lshift , reg_central_lshift , type = "html", title = "Robustness test: Left perturbation") 

We see that with only a slight perturbation in the TSI calculation the regression results are no longer significant. The perturbations are small, but have a big impact on the physiology of the fly. This reassures us that the TSI is a good measurement to capture the effect of the sleeping sickness and not only climate conditions.

The same perturbation we did to the left side we know want to apply to the right side.

reg_animals_rshift = felm(animals ~ perturb_TSI2 + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_intensive_rshift = felm(intensive ~ perturb_TSI2 + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_plow_rshift = felm(plow ~ perturb_TSI2 + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_female_rshift = felm(female_ag ~ perturb_TSI2 + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_popd_rshift = felm(ln_popd ~ perturb_TSI2 + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_slavery_rshift = felm(slavery ~ perturb_TSI2 + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_central_rshift = felm(central ~ perturb_TSI2 + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

After we calculate the regressions with the perturbed TSI we want to show the results with stargazer. Just press check to create a table.

stargazer(reg_animals_rshift , reg_intensive_rshift , reg_plow_rshift , reg_female_rshift , reg_popd_rshift , reg_slavery_rshift , reg_central_rshift , type = "html", title = "Robustness test: Right perturbation") 

In this case - the perturbation to the right - we see the same changes as before. The regression results are also no longer significant.

Instinctive growth rate

The second robustness test we perform affects the way the TSI is calculated. To estimate the number of flies in historical Africa we use the climate data temperature and humidity.

Critics might argue that the formula describing the relationship between climate input variables and the TseTse density is manipulated to find a regression correlation. To weaken this argument, we repeat the regression using the intrinsic growth rate. The corresponding formula:

$$ \Lambda = max ((B - M),0)$$

So, to get the growth rate ($\Lambda$), we simply subtract the death rate (M) from the birth rate (B) with the restriction that there is no negative population.

Subsequent we replace the TSI with the calculated growth rate and repeat the regressions. The author already calculated the growth rate and saved it in the dataset under the variable called r. Just click check.

reg_animals_growth = felm(animals ~ r + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_intensive_growth = felm(intensive ~ r + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_plow_growth = felm(plow ~ r + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_female_growth = felm(female_ag ~ r + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_popd_growth = felm(ln_popd ~ r + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_slavery_growth = felm(slavery ~ r + prop_tropics + meantemp + meanrh +itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_central_growth = felm(central ~ r + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

Printing out the regression output. Press to check to display the table.

stargazer(reg_animals_growth , reg_intensive_growth , reg_plow_growth , reg_female_growth , reg_popd_growth , reg_slavery_growth , reg_central_growth , type = "html", title = "Robustness test: Instinctive growth rate")

When observe that the regression results are still significant. Also, the signs of the coefficients are like the regression performed with TSI. These results confirm us that we are not only picking up a physiological relationship between climate and TSI.

Maybe you are confused because the regression coefficients are completely different to the regression with TSI. This is because the TSI is normalized and describes the steady state fly population in contrast to the variable r which describes the growth rate.

Optimal TseTse conditions

The last concern we want to test is whether the TSI is based on cherry-picking parameters. If you did not hear about cherry picking in data analysis before, please open the info block below. The concern is that the parameters used to calculate the TSI are manipulated to get the desired result. Hence the underlying formula is calculated to get significant regression results between TSI and the development variables.

info("Cherry picking") # Run this line (Strg-Enter) to show info

To weaken this concern, we no longer predict the TseTse by a method of potential based on laboratory data. Instead we use climate data collected through field research by Rogers and Randolph (1986) to predict the TseTse distribution. We calculate an index called optimal which simulates the optimal fly survival rate by converting the climate conditions into a dummy variable.

info("Rogers and Randolph: optimal fly survival") # Run this line (Strg-Enter) to show info

Like before we use the felm() command to calculate the regression with clustered standard errors and control variables. Optimum is the variable which measures the optimal fly survival. Just click check to run the regressions.

reg_animals_opt = felm(animals ~ optimum + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_intensive_opt = felm(intensive ~ optimum + prop_tropics + meantemp + meanrh +itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_plow_opt = felm(plow ~ optimum + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_female_opt = felm(female_ag ~ optimum + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_popd_opt = felm(ln_popd ~ optimum + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_slavery_opt = felm(slavery ~ optimum + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

reg_central_opt = felm(central ~ optimum + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province , data = pre , psdef = FALSE)

Printing out the results:

stargazer(reg_animals_opt , reg_intensive_opt , reg_plow_opt , reg_female_opt , reg_popd_opt , reg_slavery_opt , reg_central_opt , type = "html" , title = "Robustness test: Optimal TseTse conditions (field research) ")

Looking at the regressions we do not see big changes in the outcome. Only the variable central which measures the rate of historical centralization is no longer significant and population density and slavery lost some percentage points in their significant levels. But overall the results are reassuring that our previous regression did not perform cherry picking unintentionally.

The author also performs a sensitivity analysis to test for fallacy of incomplete evidence and a Box-plot transformation because the TSI is negatively skewed. To hold the problem set short and interesting we will not discuss this in detail, but the results are reassuring that the TSI is a good way to predict development outcomes.

Alternative clustering

In this chapter, we perform the regressions analyzing historical agriculture and development outcome by using different approaches to calculate standard errors.
Remember: In our benchmark regression we used standard errors clustered by cultural relatedness.

Standard errors clustered by country

In this section, we choose an alternative way to cluster the standard errors. We cluster no longer by province instead we use the variable isocode to cluster by country. The variable contains an abbreviation for each ethnic group which refers to the geographic position in Africa.

Before we use the new cluster we want to get a better understanding for the variable isocode which is part of the dataset pre. Consequently, we use the command table(name of the variable) which prints out the different characteristics together with the frequency they are found in the dataset. Please fill in the right command in the field below.


! addonquizisocode

Next, we want to calculate the regressions clustering the standard errors by country. Run the regression by clicking check.

reg_animals_country = felm(animals ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | isocode , data = pre , psdef = FALSE)

reg_intensive_country = felm(intensive ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | isocode , data = pre , psdef = FALSE)

reg_plow_country = felm(plow ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | isocode , data = pre , psdef = FALSE)

reg_female_country = felm(female_ag ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | isocode , data = pre , psdef = FALSE)

reg_popd_country = felm(ln_popd ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | isocode , data = pre , psdef = FALSE)

reg_slavery_country = felm(slavery ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | isocode , data = pre , psdef = FALSE)

reg_central_country = felm(central ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | isocode , data = pre , psdef = FALSE)

Printing out the results:

stargazer(reg_animals_country , reg_intensive_country , reg_plow_country , reg_female_country , reg_popd_country , reg_slavery_country , reg_central_country , type = "html" , title = "Robustness test: Country cluster ")

What are differences to the benchmark regression?
We see that the standard errors did not change a lot. For example, the standard error with cultural relatedness clusters for the intensive agriculture is 0.028 compared to 0.03 when using country clusters. Also, the regression results are still significant at a low level. This reassure us that the selected province cluster captures well for spatial relatedness.

Multiway Clustering

In this chapter, we not only use one cluster as we did before. For calculating the standard errors we now cluster by cultural province and country. Technically we use the felm() function again and combine the two selected clusters with +. Please run the code with clicking check.

reg_animals_multic = felm(animals ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province + isocode , data = pre , psdef = FALSE)

reg_intensive_multic= felm(intensive~TSI + prop_tropics + meantemp + meanrh +itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province + isocode , data = pre , psdef = FALSE)

reg_plow_multic = felm(plow ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province + isocode , data = pre , psdef = FALSE)

reg_female_multic = felm(female_ag ~ TSI + prop_tropics + meantemp + meanrh +itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province + isocode , data = pre , psdef = FALSE)

reg_popd_multic = felm(ln_popd ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province + isocode , data = pre , psdef = FALSE)

reg_slavery_multic = felm(slavery ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province + isocode , data = pre , psdef = FALSE)

reg_central_multic = felm(central ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | 0 | 0 | province + isocode , data = pre , psdef = FALSE)

Now please print out the result by clicking check.

stargazer(reg_animals_multic , reg_intensive_multic , reg_plow_multic , reg_female_multic , reg_popd_multic , reg_slavery_multic , reg_central_multic , type = "html" , title = "Robustness test: Multiway Clustering (province and isocode)") 

! addonquizmultiway cluster

These results confirm our benchmark regression.

Negative selection

Another aspect that we should consider is what happened before the Murdock's map - which we use for our analysis - was written. How did the groups interacte? Did more advanced groups force less developed groups onto TseTse infested regions? If this is often the case, the TseTse suitability index would not only measure the direct biological effect of the transmitted sleeping disease. Instead it also includes evolutionary selection.

To control for this effect of negative selections we use fixed effects for cultural relationship. Cultural relationship acts here like the representative of group strength.

info("Negative Selection") # Run this line (Strg-Enter) to show info

Please click check.
Note: The standard errors are still clustered by province.

reg_animals_fe = felm(animals ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | province | 0 | province , data = pre , psdef = FALSE)

reg_intensive_fe = felm(intensive ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | province | 0 | province , data = pre , psdef = FALSE)

reg_plow_fe = felm(plow ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | province | 0 | province , data = pre , psdef = FALSE)

reg_female_fe = felm(female_ag ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | province | 0 | province , data = pre , psdef = FALSE)

reg_popd_fe = felm(ln_popd ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | province | 0 | province , data = pre , psdef = FALSE)

reg_slavery_fe = felm(slavery ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | province | 0 | province , data = pre , psdef = FALSE)

reg_central_fe = felm(central ~ TSI + prop_tropics + meantemp + meanrh + itx + malaria + coast + river + lon + abslat + meanalt + SI | province | 0 | province , data = pre , psdef = FALSE)

Now please print out the result by clicking check.

stargazer(reg_animals_fe , reg_intensive_fe , reg_plow_fe , reg_female_fe , reg_popd_fe , reg_slavery_fe , reg_central_fe , type = "html" , title = "Robustness test: Fixed effects (province) ")

The significance level got a little smaller but the regressions still stay significant. Also, the standard errors did not get a lot bigger. These results reassure us that the TSI does capture direct biological effect.

This exercise refers to page 15 - 17 of the paper.

Exercise 10 -- Conclusion and Outlook

In this chapter, we want to shortly summarize our methods and findings and discuss approaches of Tsetse control.

Summary

In this problem set we analyzed the effect of TseTse transmitted disease on the historical and modern development of Africa. The peculiarity of the paper written by Marcella Alsan are the methods she used to measure the TseTse distribution. The TseTse suitability index is measured by performing laboratory experiments to describe the physiology of TseTse and the use of insect growth models based on climate data to define a steady state population of the fly. In the following we used the TSI to run regressions on ethnographic precolonial data. We analyzed inter alia the correlation with subsistence strategies, societies, and centralization.

What did we find out concerning the precolonial development? In Africa there is a correlation between the TSI calculated with historical climate data and agricultural practices classified as less advanced, a stronger slave labor system and a lower population density. It is reassuring for our hypothesis that regressions on African development with data from groups living outside Africa did not reach significance.

In the next test we simulated the African development variables with a lower TSI level. The results are moderate increases in the precolonial outcome variables measuring political and institutional centralization as well as intensive farming. But we must be careful in interpreting this simulation. The results do not take into account endogenous responses to the elimination of TseTse.

Subsequently we investigated archaeological findings of further developed societies. These civilizations developed mainly in the regions of Africa which show a low TSI. This is consistent with the theory that the TseTse slowed down the African development.

What are our findings regarding effect of the TseTse on the modern African development?

To find out, we first performed regressions on luminosity as a measure of economics and political outcome and second on the amount of cattle, both with modern data. TseTse appears to still impact today's development in Africa mostly through historical centralization. The theory is that TSI hindered the development of advanced societies and this has a negative effect on the long-term development perspective. While regressing on the number of cattle we find a negative correlation even when controlling for precolonial institutions and using of country fixed effects. This finding points out that the TseTse has still a direct impact on husbandry in today's Africa. So, it is an important key to understand and enhance African development and animal farming.

! addonquizTseTse economic deficit and animal loss

This number is estimated by experts from IAEA in 2002. The IAEA (2002) also measured that Nagana transmitted by the TseTse is responsible for an annual death rate of 3 million cattle.

Discussion

As a last step we want to shortly discuss the pros and cons of approaches aiming to control Trypanosomiasis. And how promising they are to eliminate this sickness from Africa.

If we think about a solution, two possibilities come into mind. Either we eradicate the TseTse so there is no longer a vector to transmit Trypanosomiasis or we vaccinate all animals before they can get infected.

What about medication? Treatments for infected animals do exist but they are expensive. In a few countries the sales of trypanosomiasis treatment accounts for over 50 % of the total sales on veterinary drugs. Also, the diagnosis costs money which many farmers cannot afford so most drugs are given without a diagnose. Through this practice the treatment gets inefficient because an increasing number of residences occur (Feldmann & Hendrichs 2001; De Deken o.J., p. 5). So, at the state of current research this is not the optimal way to fight trypanosomiases (Kroubi et al. 2011).

We already discussed several eradication campaigns in an info block before. But it is difficult to completely eradicate the TseTse from the whole continent. Also, there might be a negative impact on the biodiversity if the fly is exterminated. Intensive agriculture and an increase in livestock farming will replace historically developed sustainable systems of land use and co-exist with the fauna (Anderson et al. 2015).

Some experts see vaccination as the only long-lasting, effective, and safe way to fight the sleeping sickness. A vaccination does not exist yet. We do find a natural immunity in wildlife. On this base it might be possible to develop a vaccination against the sleeping sickness which creates an immune protection. Researches focused to create a vaccination which primes at the surface of the parasite which is composed by millions of proteins. But there are constant recreations which make it hard to create a vaccination fitting all. (La Greca and Magez 2011)

In summary we can say that there is still much work required to find an optimal way in order to control the sleeping sickness.

Thank you!

Now we have already reached the end of our economic journey. But do not be sad there are more problem sets to various topics that cannot wait to be solved by you. Just right click here and open a new tab to get an overview.

Thanks for staying till the end!

To see the number of awards you earned while working through the exercises together with a description you can click check:

awards(as.html=TRUE)

Exercise Bibliography -- References

Books, Papers, and Websites:

R Packages:

Images:

Code and Data:

License:

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Author: Vanessa Schoeller



skranz/RTutorTseTse documentation built on May 20, 2019, 7:02 p.m.