knitr::opts_chunk$set(echo = T, eval = T, message = F, warning = F, out.extra = 'class="plot"', fig.align = "center")
The hrp2malaRia
package was written by OJ Watson, a PhD Student with the Imperial College of London Mathematical Modeling group. This package was used in his recent Elife Publication, Watson OJ et al. 2017, Modelling the drivers of the spread of Plasmodium falciparum hrp2 gene deletions in sub-Saharan Africa. Many of the ideas and theories that we will be discussing today were spear-headed by UNC Physician-Scientist, Jonathan Parr, along with OJ Watson and Steve Meshnick.
Additional thanks to Dayne Filer, Mike Fliss, and Sara Levintow for hacks.
show
button on the right after the coderemotes
. tidyerverse
, RColorBrewer
, and cowplot
. We will be using remotes
to install some packages from Github as well. P. falciparum malaria is an infectious disease with a huge global burden, primarily concentrated in Africa. For an interactive map of P. falciparum prevalence, see the Malaria Atlas Project.
The malaria life cycle is complex with several different stages. For the purposes of this exercise, it is only important to note that the malaria parasite is transmitted by a mosquito vector that infects the human-host.
In clinical and field settings, malaria is often diagnosed with either the use of light microscopy or rapid diagnostic tests (RDTs). In both cases, a finger-prick of blood is taken from patients and used to screen for malaria. Confirmation of infection by microscopy is made with direct visualization of the parasite by an expert microscopist while RDTs provide a "yes/no" result and can be performed by relatively anyone.
RDTs detect parasite antigens (e.g. a parasite-specific marker) from the patient's peripheral blood. In the case of the P. falciparum, the vase majority of Pf-RDTs detect a P. falciparum specific protein, histidine-rich protein 2 (hrp2).
However, recent evidence has accumulated to suggest that P. falciparum parasites have undergone selection for hrp2-gene deletions, making them no longer detectable by many RDTs ("stealth parasites"). In theory, by avoiding detection, parasites can then remain untreated and have a higher likelihood of spreading.
Note, for the purposes of our discussions, we will ignore differences between the lower-limits of detection (and sensitivity/specificity) of these two diagnostic modalities.
You are an Officer in the CDC's elite Epidemic Intelligence Service (EIS) Malaria Branch. One day, you are called by a concerned ministry of health official from a Sub-Saharan Africa country stating that their hospitals have seen several patients that had a classic clincial picutre of malaria but a negative rapid diagnostic test (RDT) at admission. Moreover, these same patients were positive for malaria by microscopy. The ministry of health official has already contacted the WHO and confirmed that their RDT stock was not defective. In addition, they have read recent reports of hrp2 deletions in South America and are now concerned that the problem may have spread to Sub-Saharan Africa. They ask for your help to predict where they should be on the lookout for hrp2 deletions.
Before we began, discuss the following points with your neighbors (or as a group):
If you were asked to a design an antigen-based detection method, what type of antigen would you pick (e.g. from an evolutionary point of view, what type of proteins/markers would be best)? _________
Do you expect for parasite transmission intensity to affect the prevalence of hrp2 deletions (hint: think about superinfections)?
How does virulence affect survelliance and detection efforts (assuming that increasing levels of virulence as associated with increasingly worse symptoms)? _______
In the previous exercise led by Dr. Nunn, you first fit a deterministic SIR model to analyze the spread of a "childhood disease" through a susceptible population. You then extended your deterministic SIR model to include fluctuations in the population size through births and deaths (e.g. a demographic model). Next, you modeled the uncertainty and the variation in your model through simulation using a stochastic approach. As a reminder, stochastic individual-based models essentially follow individuals through the different model compartments and record the amount of time the spend in each state as they move through the simulation. In short, a deterministic model is a "closed-system" where we will arrive at the same answer every time (given the same parameters, initial conditions, etc), while, a stochastic model allows us to simulate many possible iterations of our model and directly express the randomness (or really uncertainty) of our parameters and outcome.
In this exercise, we are going to use an individual-based, stochastic malaria transmission model that is an extension of the "Imperial Model" (see OJ's Material & Methods) for a full description that was based on Griffin et al. 2014, 2015, 2016).
Refer to Figure 5 from OJ's manuscript. What kind of compartment model is the "hrp2-Imperial Model"? ____
Note, for the purpose of this exercise, we are going to ignore some of the (beautiful) complexities of the model and instead talk about this model in a Susceptible-Infected-Recovered-Susceptible framework (i.e. the U,A,D compartments are collapsed into the "Infected" category and Recovered corresponds to the period of active treatment/prophylaxis -- the T,P compartments -- when individuals are "immune"). Can you think of another scenario where you would potentially use a SIRS model (hint: think about your tetanus vaccines)? ______
Please now open the hrp2malaRia
package README and read the Input Variables to Change and the Output Variables of Interest.
For the purpose of this exercise, we are going to make some additional simplifying assumptions in order to use a subset of the parameters that were used in Watson et al. 2017.
First, we are going to assume that if an RDT is negative due to hrp2-deletions, that individual will not receive treatment (NB, in a clinical setting, treatments are often empiric even if a test result is negative). Second, we are going to assume that there is no cross-reactivity of other hrp antigens. These simplifying assumptions will be expressed in our model by setting the following parameters to 0: rdt.nonadherence = 0, rdt.det = 0
.
For the purpose of this exercise, we are always going to perform simulations of a population of 2,000 indiviudals over a 20-year timespan. In addition, we will assume that at the start of the simulation, there is a 10:1 ratio of wild-type P. falciparum strains to hrp2-deleted strains. Finally, all outputs will be considered for individuals under 5-years of age, as those individuals carry the heaviest burden of malaria infections.
We are going to be focusing on a few parameters of the "hrp2-Imperial Model", namely:
The hrp2_Simulation
function returns many results that allow for many different questions to be investigated. For the purpose of our exercise, we will focus on the changes in hrp2-deletion frequency (S.N.Dels.05
), incidence (S.Incidence.05
), and proportion of infections that contain only hrp2-deleted parasites (S.Prev.Mono.D.05
) among children under 5-years of age over time (S.Times
)
First you will need to make sure that you have the several packages installed. The command below installs any of those needed packages that you may be missing. It then uses the remotes
package (an excellent resource for R developers) to install OJ's package from GitHub. In addition, we will install some functions that I made for this lab (EMSI2019hrp2lab
) and the furrr
package, which will give us some needed computational-speed later. Note, I set a seed below so that my script reproduces the same results every time. For the purpose of this exercise, do not set a seed and instead appreciate the variation in the stochastic models.
# check to make sure user has the needed packages; if missing, install for them list.of.packages <- c("tidyverse", "remotes", "cowplot", "RColorBrewer") new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])] if(length(new.packages)) install.packages(new.packages) remotes::install_github("OJWatson/hrp2malaRia") remotes::install_github("nickbrazeau/EMSI2019hrp2lab") # hacky way to download this tutorial as a package remotes::install_github("DavisVaughan/furrr") # we are going to use this for speed later library(hrp2malaRia) library(EMSI2019hrp2lab) library(furrr) library(tidyverse) library(RColorBrewer) library(cowplot) set.seed(42) # please don't run this line!
Imagine that there are three populations with a constant EIR of 1 and the hrp2-deleted and WT strains have equal fitness. In addition, in all three populations cases seek care 50% of the time. However, the populations differ by:
Population 1:
Population 2:
Population 3:
Let's run the code (once) and plot the result:
# Simulations with N = 2000 take ~1 minute to run on my MacBook Pro (3.1 GHz Intel Core i7) # Population 1 senario1.pop1 <- hrp2malaRia::hrp2_Simulation(N=2000, years=20, strains.0 = 10, microscopy.use = 0, EIR=1/365, ft = 0.5, fitness = 1, rdt.det = 0, rdt.nonadherence = 0 ) # Population 2 senario1.pop2 <- hrp2malaRia::hrp2_Simulation(N=2000, years=20, microscopy.use = 0.5, EIR=1/365, ft = 0.5, fitness = 1, strains.0 = 10, rdt.det = 0, rdt.nonadherence = 0 ) # Population 3 senario1.pop3 <- hrp2malaRia::hrp2_Simulation(N=2000, years=20, microscopy.use = 1, EIR=1/365, ft = 0.5, fitness = 1, strains.0 = 10, rdt.det = 0, rdt.nonadherence = 0 ) # Now, let's plot these - all variables that begin with S. are series variables collected over time. # let's plot the first population plot(senario1.pop1$S.Times,senario1.pop1$S.N.Dels, xlab = "time (days)", ylab = "hrp2 deletion Frequency", ylim=c(0,1), col="red", type="l", lwd=2) # and add the second population lines(senario1.pop2$S.Times, senario1.pop2$S.N.Dels,lwd=2, col="blue") # and add the third population lines(senario1.pop3$S.Times, senario1.pop3$S.N.Dels,lwd=2, col="green") # and add our legend legend(2000, .25, legend=c("Only RDT use", "50% RDT use", "0% RDT Use"), col=c("red", "blue", "green"), lty=1, cex=0.8, box.lty=2, box.lwd=2)
Given that this is a stochastic model, there will be run-to-run variability of each model iteration. This variability is important in quantifying our uncertainty of the stochastic processes that we are modeling.
To perform multiple runs of the model, you can simply copy and paste the code above (and rename the output objects [e.g. r1.2
], as to not overwrite your previous results). Or, we can make a map dataframe, where we specify the different parameters we want for each run and use this dataframe as the "captain" calling the "plays".
In this map dataframe, each row will be one iteration of our model. Columns will contain our parameter options and the cells will be the conditions, or options that we set, for each parameter within each model. Using a map file is a cornerstone of functional programming and a high-yield concept for reproducible research. Plus, as Dr. Nunn noted earlier, stochastic models are typically slower to run than deterministic models. As a result, we now have a need for speed, which is made easier with a map dataframe that we can "loop" through.
Why do you think stochastic models are typically slower to run than deterministic models? ___
We are going to use the furrr
package to take advantage of the multiple processors on your computer (e.g. parallelize the simulations). This is simply for speed. If you are having trouble running the furrr
package, you can change the furrr::future_pmap
commands below to purrr::pmap
and will get the same result (albeit at a slower pace). Or, you can just write out each model individually using the code above.
Let's try our map dataframe out below:
future::plan(multiprocess) # call forward for furrr hrp2map <- data.frame(run = c(1,1,1, 2,2,2, 3,3,3), N=2000, years=20, microscopy.use = c(0,0.5,1, 0,0.5,1, 0,0.5,1), EIR=1/365, ft = 0.5, fitness = 1, strains.0 = 10, rdt.nonadherence = 0, rdt.det = 0)
Let's look at what we just made:
dplyr::glimpse(hrp2map)
OK, let's run our models and store them in our map dataframe.
hrp2map$results <- furrr::future_pmap(hrp2map[, !colnames(hrp2map) %in% "run"], # note, we need to drop the run column as that is not an option for hrp2_Simulation hrp2malaRia::hrp2_Simulation)
Now let's visualize our plots of the proportion of hrp2-deleted parasites over time.
# note, I wrote this function for you, it returns a ggplot plot.scenario1.run1 <- plothrp2models(pop1 = hrp2map$results[[1]], pop2 = hrp2map$results[[2]], pop3 = hrp2map$results[[3]], labels = c("Only RDT use", "50% RDT use", "0% RDT use")) # you need to manually add the labels # you can manually add a title like this too plot.scenario1.run1 <- plot.scenario1.run1 + ggtitle("hrp2 Simulation Scenario 1, Run 1") # now consider run 2 plot.scenario1.run2 <- plothrp2models(pop1 = hrp2map$results[[4]], pop2 = hrp2map$results[[5]], pop3 = hrp2map$results[[6]], labels = c("Only RDT use", "50% RDT use", "0% RDT use")) plot.scenario1.run2 <- plot.scenario1.run2 + ggtitle("hrp2 Simulation Scenario 1, Run 2") # now consider run 3 plot.scenario1.run3 <- plothrp2models(pop1 = hrp2map$results[[7]], pop2 = hrp2map$results[[8]], pop3 = hrp2map$results[[9]], labels = c("Only RDT use", "50% RDT use", "0% RDT use")) plot.scenario1.run3 <- plot.scenario1.run3 + ggtitle("hrp2 Simulation Scenario 1, Run 3") # Plot them together cowplot::plot_grid(plot.scenario1.run1, plot.scenario1.run2, plot.scenario1.run3, ncol=1)
Again, imagine that there are three populations that all only use RDTs for malaria diagnosis and the hrp2-deleted and WT strains have equal fitness. In addition, in all three populations cases seek care 50% of the time. However, the populations differ by:
Population 1:
Population 2:
Population 3:
# going to overwrite our old hrp2map and make a new one hrp2map <- data.frame(run = c(1,1,1,2,2,2), N=2000, years=20, microscopy.use = 0, EIR=c(1/365, 10/365, 100/365, 1/365, 10/365, 100/365), ft = 0.5, strains.0 = 10, fitness = 1, rdt.nonadherence = 0, rdt.det = 0 ) hrp2map$results <- furrr::future_pmap(hrp2map[, !colnames(hrp2map) %in% "run"], hrp2malaRia::hrp2_Simulation) plot.scenario2.run1 <- plothrp2models(pop1 = hrp2map$results[[1]], pop2 = hrp2map$results[[2]], pop3 = hrp2map$results[[3]], labels = c("EIR 1", "EIR 10", "EIR 100")) plot.scenario2.run1 <- plot.scenario2.run1 + ggtitle("hrp2 Simulation Scenario 2, Run 1") plot.scenario2.run2 <- plothrp2models(pop1 = hrp2map$results[[4]], pop2 = hrp2map$results[[5]], pop3 = hrp2map$results[[6]], labels = c("EIR 1", "EIR 10", "EIR 100")) plot.scenario2.run2 <- plot.scenario2.run2 + ggtitle("hrp2 Simulation Scenario 2, Run 2") cowplot::plot_grid(plot.scenario2.run1, plot.scenario2.run2,ncol=1)
Again, imagine that there are three populations that all only use RDTs for malaria diagnosis and have a constant EIR of 1. In addition, all populations have hrp2-deleted and WT parasites with equal fitness. However, the populations differ by:
Population 1:
Population 2:
Population 3:
# going to overwrite our old hrp2map and make a new one hrp2map <- data.frame(run = c(1,1,1, 2,2,2), N=2000, years=20, microscopy.use = 0, EIR=1/365, ft = c(0,0.5,1, 0,0.5,1), strains.0 = 10, fitness = 1, rdt.nonadherence = 0, rdt.det = 0 ) hrp2map$results <- furrr::future_pmap(hrp2map[, !colnames(hrp2map) %in% "run"], hrp2malaRia::hrp2_Simulation) plot.scenario3.run1 <- plothrp2models(pop1 = hrp2map$results[[1]], pop2 = hrp2map$results[[2]], pop3 = hrp2map$results[[3]], labels = c("No Care-Seeking", "50% Care-Seeking", "All Care-Seeking")) plot.scenario3.run1 <- plot.scenario3.run1 + ggtitle("hrp2 Simulation Scenario 3, Run 1") plot.scenario3.run2 <- plothrp2models(pop1 = hrp2map$results[[4]], pop2 = hrp2map$results[[5]], pop3 = hrp2map$results[[6]], labels = c("No Care-Seeking", "50% Care-Seeking", "All Care-Seeking")) plot.scenario3.run2 <- plot.scenario3.run2 + ggtitle("hrp2 Simulation Scenario 3, Run 2") cowplot::plot_grid(plot.scenario3.run1, plot.scenario3.run2,ncol=1)
For the last time, imagine that there are three populations with a constant EIR of 1 and all populations only use RDTs for diagnosis. Further, imagine that 50% of the cases seek treatment in each population. However, the populations differ by:
Population 1:
Population 2:
Population 3:
* Our group has hypothesized that hrp2-deleted parasites are less virulent and therefore more likely to contribute to the asymptomatic reservoir. We have modeled that here as a "boost" in fitness, as OJ's model codes fitness as "the probability that a strain is selected from the infectious pool". As such, we can increase or decrease the probability that the hrp2-deleted strain or the WT strain is propagated forward-in-time by our model (NB, this is a classic definition of fitness). Note, this is a bit of an abstraction from OJ's original model/fitness parameter.
# going to overwrite our old hrp2map and make a new one hrp2map <- data.frame( run = c(1,1,1,2,2,2), N=2000, years=20, microscopy.use = 0, EIR=1/365, ft = 0.5, fitness = c(0.8,1,1.2, 0.8,1,1.2), strains.0 = 10, rdt.nonadherence = 0, rdt.det = 0 ) hrp2map$results <- furrr::future_pmap(hrp2map[, !colnames(hrp2map) %in% "run"], hrp2malaRia::hrp2_Simulation) plot.scenario4.run1 <- plothrp2models(pop1 = hrp2map$results[[1]], pop2 = hrp2map$results[[2]], pop3 = hrp2map$results[[3]], labels = c("hrp2 Fitness Cost", "hrp2 No Fitness", "hrp2 Fitness Benefit")) plot.scenario4.run1 <- plot.scenario4.run1 + ggtitle("hrp2 Simulation Scenario 4, Run 1") plot.scenario4.run2 <- plothrp2models(pop1 = hrp2map$results[[4]], pop2 = hrp2map$results[[5]], pop3 = hrp2map$results[[6]], labels = c("hrp2 Fitness Cost", "hrp2 No Fitness", "hrp2 Fitness Benefit")) plot.scenario4.run2 <- plot.scenario4.run2 + ggtitle("hrp2 Simulation Scenario 4, Run 2") cowplot::plot_grid(plot.scenario4.run1, plot.scenario4.run2,ncol=1)
If you have time, run a few models that have multiple parameters that are different from those above. I recommend varying "fitness"" and "proportion of cases seeking treatment"" in order to see how they two variables collectively affect the overall frequency of hrp2-deleted parasites.
By now, you have started to answer our Ministry of Health Official's question, as we have shown that populations with a high proportion of RDT use, low malaria transmission (low EIR), and a high proportion of cases seeking care are predictive of increases in hrp2-deletion frequency (Watson et al. 2017). In addition, we have shown that a large fitness cost is associated with the loss of hrp2-deleted parasites, but that in other cases of comparative fitness, hrp2-deleted parasites persist (or thrive) in the population (Watson et al. 2017, see Supp. Fig. 2). As a result, we can tell our Ministry of Health Official that regions with low-endemicity and large access to care are areas potentially most concerning for hrp2-deletions (of note, OJ's 2017 predictions have been pretty accurate to date, particularly in Eritrea and other regions in Northern Africa).
However, our Ministry of Health Official asks the next logical question: "Should I continue using RDTs"?
It is now a good time to note that this question mixes prediction and causal inference, both vast fields of study (the latter being the UNC Epi Department's forte). We are going to glaze over a huge number of assumptions and caveats (which I apologize for).
Up until this point, we have been modeling the frequency (or proportion) of hrp2 parasites in each population. However, our question now asks about the utility of RDTs in a setting of hrp2 deletions. As such, we will take the "worst" possible scenario and compare two populations in a low EIR setting, where one population only uses RDT and all cases seek treatment versus a seperate population that only uses microscopy and all cases seek treatment. We will then focus in on differences in incidence between these two populations.
# I am only going to do one run, but feel free to do more incidenceexmpl <- data.frame(N=2000, years=20, microscopy.use = c(0,1), EIR=1/365, ft = 1, fitness = 1, strains.0 = 10, rdt.det = 0, rdt.nonadherence = 0) incidenceexmpl$results <- furrr::future_pmap(incidenceexmpl, hrp2malaRia::hrp2_Simulation)
Let's now pull out the incidence data.
ret <- EMSI2019hrp2lab::extracthrp2results(pop1 = incidenceexmpl$results[[1]], pop2 = incidenceexmpl$results[[2]])
We can open it up to see what we extracted. Note, the pfinc column is from the hrp2malaRia
.$S.Incidence.05 column.
glimpse(ret)
ret %>% ggplot(.) + geom_line(aes(x=time, y=pfinc, group = factor(pop), colour=factor(pop))) + xlab("Times (Days)") + ylab("Incidence") + scale_color_manual(name = "", labels = c("RDT/100% Care", "Microscopy/100% Care"), values = RColorBrewer::brewer.pal(8, "Set1")) + theme_minimal() + theme( plot.title = element_text(hjust=0.5), legend.position = "bottom" )
To answer this question, it may help to subset to just a year of "time". You could do this by adding the following to the code chunk above:
ret %>% dplyr::filter(time < 365) %>% # this would be the first year ggplot(.) + ...
Without going into too much detail, incidence is indexed by time (this is an "open population"", so we can't use "risk"). As a result, we can model the incidence difference between these two populations at each time-point in order to get a sense of how many more infections the "Only RDT Use, 100% Seek Care" population has as compared to "Only Microscopy Use, 20% Seek Care" population.
days <- seq(0, (365*20)) pop1 <- ret %>% dplyr::filter(pop == 1) pop2 <- ret %>% dplyr::filter(pop == 2) # quick local linear interpolation to get an incidence for every single day of the 20 years (not just the S.times) pop1 <- approx(x = pop1$time, y = pop1$pfinc, xout = days, method = "linear") %>% cbind.data.frame() %>% dplyr::rename(time = x, pfinc = y) %>% dplyr::mutate(pop = 1) pop2 <- approx(x = pop2$time, y = pop2$pfinc, xout = days, method = "linear") %>% cbind.data.frame() %>% dplyr::rename(time = x, pfinc = y) %>% dplyr::mutate(pop = 2) # now let's rejoin and plot the difference left_join(pop1, pop2, by = "time") %>% dplyr::mutate(diff = pfinc.x - pfinc.y) %>% dplyr::filter(!is.na(diff)) %>% # get rid of boundary issues ("cheating, we could do better") ggplot() + geom_line(aes(x=time, y=diff), color = "blue") + geom_hline(yintercept = 0, color = "red") + xlab("Times (Days)") + ylab("Incidence") + ggtitle("Incidence Difference between Population 1 and 2") + theme_minimal() + theme( plot.title = element_text(hjust=0.5), legend.position = "bottom" )
On your own, alter the simulation (e.g. vary EIR, etc.). How does the incidence difference plot change? What should you tell your Ministry of Health Official friend?
Although we can compare incidence differences directly (above), we should also consider the "cost" of a missed infection (e.g. an individual with hrp2 deleted parasites that was misdiagnosed by RDT, a "false-negative"). As a rough approximation, let's calculate the number of cases that we are missing in population 1 by using RDTs instead of microscopy. We will assume that population 2 has zero false-negatives. As such, the proportion of missed cases in population 1 from RDT-use can be approximated by multiplying incidence by the proportion of cases with only hrp2-deleted parasites minus the (baseline) of population 1.
ret %>% dplyr::filter(pop == 1) %>% # assume microscpy had no missed cases dplyr::mutate(missedprop = pfinc * hrp2prevmono) %>% ggplot() + geom_line(aes(x=time, y=missedprop), color = "blue") + xlab("Times (Days)") + ylab("hrp2-only Incidence") + ggtitle("Incidence of hrp2 only infected individuals (missed case proportion)") + theme_minimal() + theme( plot.title = element_text(hjust=0.5), legend.position = "bottom" )
Now taking into account the "cost" of a potential missed infection, does your response to our Ministry of Health friend change?
Thank you for completing this lab!!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.