In IBAHCM/RPiR: Reproducible Programming in R

library(learnr)

knitr::opts_chunk$set(error = TRUE)
options(knitr.kable.NA = '')

set.seed(123)

Overview

In this practical we will write some code to run a simple epidemiological model -- the Susceptible-Infected-Susceptible or SIS model. From these results we will report on the transmission of E. coli O157 in cattle.

Background

We have now built a simple population dynamics model, but the framework we are using is capable of handling much more. We will start with the simplest model of infectious disease dynamics -- the SIS model -- that we covered in the epidemiology lecture.

The SIS model requires two populations (of susceptible and infected animals) to replace the single population we have used so far. Susceptibles, $S$, and infecteds, $I$, are updated as follows:

$$S(t+1) = S(t) - \beta \times \frac{S(t) \times I(t)}{N} + \sigma \times I(t)$$ $$I(t+1) = I(t) + \beta \times \frac{S(t) \times I(t)}{N} - \sigma \times I(t)$$

where the total population, $N = S(t) + I(t)$, the sum of the number of susceptibles and infecteds is constant.

As we said earlier, it is a good idea to use descriptive names in your code to avoid confusion, so we will call the two populations susceptibles and infecteds - these will be the column names in your data frame. The two epidemiological parameters are $\beta$ (beta, the transmission rate) and $\sigma$ (sigma, the recovery rate) -- these Greek letters are almost universally used as the names of basic transmission and recovery rates in epidemiology. However, unless you're doing a lot of epidemiology, this isn't obvious. In any event, in our function definition we will give them descriptive names again, calling them generically transmission.rate and recovery.rate in the function definition, and in the script we will be more specific and call them ecoli.transmission and ecoli.recovery since this is the disease we are going to be modelling (it also avoids reusing the same variable names). The reason we're doing it this way round is that the function can work for any disease model, but the script is specifically for E Coli.

Tasks

Technical setup

The first thing to do is to create a new GitHub repository and RStudio project for this practical. Do this following the how to instructions on the GitHub documentation pages that you used for the last practical, putting the repository in the SBOHVM organisation again, as before. Call this project githubusernameSeries02 (obviously use your real GitHub username!). Remember that if your GitHub username has dashes or underscores, don't include them in the repository name however, as they are illegal in R package names. You'll use this single project for all of the exercises in this practical series. If you are already in a coding group, then add the specific people that need to be added to your repository on GitHub in Settings -> Manage access -> Add people and just type in their GitHub usernames (no @ sign). Give them read access. If you don't have a group yet, ask the instructors which one you should join.

Coding

Create a new R file, using File > New > R Script. You should adapt the code from Practical 1-5 (remember you now have a project with all of this code in, and it is easy to have two RStudio sessions so you can copy and refer to earlier code easily), creating 2 files -- 0201-run-SIS.R and 0201-step-SIS.R -- but where in 1-5 the data frame had only a count column, now you will need to have two -- susceptibles and infecteds. You do not need to have the total population size (N, above) in there, because at every step it is just susceptibles + infecteds. In fact, with no death in this model, it is just constant. You could for instance start things up with:

num.cattle <- 100
initial.infecteds <- 2
initial.susceptibles <- num.cattle - initial.infecteds

This allows us to easily keep the things we want to fixed, in this case the total population size, while allowing us to vary the things we are investigating. Writing the code to make this as easy as possible makes it harder to make mistakes, and that is always a good thing!

Once we've done that, we can create the data frame with those values -- this is the initial state of the population when we start the simulation:

herd.df <- data.frame(susceptibles = initial.susceptibles,
                      infecteds = initial.infecteds)

Then we call the new step function with the current population and the parameters:

next.population <- step_deterministic_SIS(latest = tail(herd.df, 1), 
                                          transmission.rate = ecoli.transmission,
                                          recovery.rate = ecoli.recovery)

If you are finding writing this code difficult, just ask, but remember that your function should:

take in a data frame with the latest population sizes and the parameters of the model;
calculate the next numbers of susceptibles and infecteds from the latest (most recent) numbers and the parameters using the formulae above; and then
return a data frame with these updated values to the main script.

Running the code

Run the code in 0201-run-SIS.R. Remember that the source() command means that it automatically loads in 0201-step-SIS.R, meaning that you don't need to load these files yourself. Don't forget to check that your function is not using any global variables. Check that you get the same results as you saw in the lecture. Adding the argument col = c("green", "red") to plot.populations() will allow you to change the colours of the (now) two populations in the plot to match those from the lecture.

The science

Remember from the lecture that the basic reproduction number, $R_0$, is given by

$$R_0 = \frac{\beta}{\sigma}$$

By changing the transmission rate, $\beta$, try running simulations for different values of $R_0$. What happens when $R_0 > 1$ and when $R_0 < 1$? Run the simulations for different values of the transmission rate (as suggested in the table below) with the default value of the recovery rate, $\sigma$, set to $\frac{1}{3}$. Calculate and record the corresponding value of $R_0$ using the formula, which we can rewrite as:

R0 <- ecoli.transmission / ecoli.recovery

Either using your plots or by asking R to print out the data frame population work out the proportion of the population who are susceptible at equilibrium -- when the lines are flat -- and compare it with the value of $\frac{1}{R_0}$ for one example value of the transmission rate $\beta = 2/3$.

tmp <- vapply(2/3, 
              function(x) as.character(MASS::fractions(x)),
              character(1))
table <- data.frame(tmp, NA, NA, NA)
col <- c("Transmission <br> rate, $\\beta$", 
         "Basic reproduction number, $R_0$",
         "Inverse $R_0$, <br> $\\frac{1}{R_0}$",
         "Proportion susceptible <br> at equilibrium, $\\frac{S}{N}$")
col <- gsub("&lt;", "<", col)
col <- gsub("&gt;", ">", col)
colnames(table) <- col
knitr::kable(table, escape = FALSE, align = c(rep('c', times = 4)))

Remember that throughout this series of exercises, the total population size, $N$ is fixed, but can be easily calculated from the different parts of the population so you never need to pass it as an argument or store it in the data frame.

Report

Adapt your script to generate a report which shows you running the model with at least a couple of different transmission rates (including $2/3$) and perhaps try calculating different values for $R_0$ from the results with some explanation of what's going on. Note that for example for E. coli O157 infections in cattle, for timesteps in weeks, the recovery rate is about $\frac{1}{3}$ and the transmission rate is 1.

GitHub

In this exercise, we want you to get a couple of other people to check your code and make sure it works for them, and we want you to check other people's code too.

Getting help with your code

Once you're happy with your code, commit your changes using the Git pane in RStudio (just commit the R files - not, for instance, any html files that were created by generating reports, though you may decide to commit a spreadsheet with the table in as well). Don't forget then to push the changes to GitHub and check on the website that it contains your new code. Notify the partners in your group that you have something for them to check. If you're not sitting next to them, then you can create an issue in your repository asking for their review - you can contact them by tagging them in the message with @theirusername, and they should receive an email and a notification on the website. They should then create an issue, telling you if they had any problems running your version of the practical.

Checking other people's code

Likewise, when you hear from someone else that a repo is available to check, then open it using File > New Project..., then Version Control, Git, and then put in the URL from GitHub and tick Open in New Session. In the new RStudio session that is opened, you can then run their code and make sure it works. If it does not, just respond in the Issue that they opened on GitHub explaining what went wrong (though it might be easier to do this in person as well!). If it works, then you also need to respond saying so. If you have already checked their repo once already, then see the instructions in 2-2 for how to update your copy of it.

Remember that interacting like this through GitHub to help each other will count as your engagement marks for the course.

IBAHCM/RPiR documentation built on Jan. 12, 2023, 7:41 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com