#load various variable definitions that are the same for each app source('startup_script.R') currentrmdfile = knitr::current_input() #currentrmdfile = "basicsir_documentation.Rmd" # For debugging appsettings = get_settings(currentrmdfile,appdocdir,packagename)
This app implements the basic SIR (susceptible-infected-recovered) model and allows you to explore a very basic infectious disease simulation. The main goal is to provide familiarity with the overall setup and ideas behind using these simulations, and how to run them. Read about the model in the "Model" tab. Make sure to read the 'general notes' section. Then do the tasks described in the "What to do" tab. Finally, check out the "Further Information" tab to learn where you can find some background information on this (and many of the other) apps.
This model is a compartmental SIR (susceptible-infected-recovered) model. Compartmental means that we place individuals into distinct compartments, according to some characteristics. We then only track the total number of individuals in each of these compartments. In the simplest model, the only characteristic we track is a person's infection status. We allow for 3 different stages/compartments:
The SIR model is very basic. It could be extended by introducing further compartments. For instance, we could stratify according to gender, which would give us 2 sets of SIR compartments, one for males and one for females. Some of these extensions are implemented in other apps.
In addition to specifying the compartments of a model, we need to specify the processes/mechanisms determining the changes for each compartment. Broadly speaking, there are processes that increase the number of individuals in a given compartment/stage, and processes that lead to a reduction. Those processes are sometimes called inflows and outflows.
For our system, we specify only 2 processes/flows:
As with the compartments, we could extend the model and allow other processes to occur. For instance, we could allow for natural births and deaths, waning immunity, deaths due to disease, etc. Some of that will be included in other apps.
For compartmental models (and also often other types of models), it is useful to show a graphical schematic representation of the compartments and processes included in the model. For compartmental models, such a diagram/figure is usually called a flow diagram. Such a diagram consists of a box for each compartment, and arrows pointing in and out of boxes to describe flows and interactions. For the simple SIR model, the flow diagram looks as follows:
knitr::include_graphics( system.file(figuredir,appsettings$modelfigname,package=packagename))
To allow us to simulate this model, we need to implement it on the computer. For that purpose, it is often useful to write the model as mathematical equations (this is not strictly needed, some computer simulation models are never formulated as mathematical models). A very common way (but not the only one) to implement compartmental models such as the simple SIR model is a set of ordinary differential equations. Each compartment/variable gets an equation. The right side of each equation specifies the processes going on in the system and how they change the numbers in each compartment via inflows and outflows. For the model described above, the equations look like this:
$$ \begin{aligned} \dot S & = - b S I \ \dot I & = b S I - g I \ \dot R & = g I \end{aligned} $$
Note: If you don't see equations but instead gibberish, try opening the app with a different browser. I have found that occasionally, on some computers/browsers, the math is not shown properly.
Continuous time models implemented as ordinary differential equations are the most common types of models. However, other implementations of the above model are possible. One alternative formulation is a discrete-time deterministic equivalent to the ODE model. For such an implementation, the equations are:
$$ \begin{aligned} S_{t+dt} & = S_t + dt * \left( - b I_t S_t \right) \ I_{t+dt} & = I_t + dt * \left( b I_t S_t - g I_t \right) \ R_{t+dt} & = R_t + dt * \left( g I_t \right) \end{aligned} $$
In words, the number of susceptible/infected/recovered at a time step dt in the future is given by the number at the current time, t, plus/minus the various inflow and outflow processes. The latter need to be multiplied by the time step, since less of these events can happen if the time step is smaller. As the time-step gets small, this discrete-time model approximates the continuous-time model above. In fact, when we implement a continuous-time model on a computer, the underlying simulator runs a "smart" version of a discrete-time model and makes sure the steps taken are so small that the numerical simulation is a good approximation of the continuous-time model. If you want to learn more about that, you can check out the 'deSolve' R package documentation, which we use to run our simulations.
For all tasks in this and the other apps, you are given a list of quantities to report (write down). If you use this software as part of a course, you might be asked to submit the answers to these quantities for grading. Unless otherwise specified, you can round each number to the closest integer.
Some of the tasks below (and in future apps) are fairly open-ended. The idea is that these tasks give you something to get started, but you should feel free to explore the simulations any way you want. Play with them, query them, go through iterations of thinking what you expect, observing it, and if discrepancies occur, figure out why. Essentially, the best way to use these apps is to do your own science/research with them.
You might find for some parameter settings that numbers in specific compartments (e.g. number of infected) drop below 1 (and possibly rebound later). This of course is biologically not reasonable. Numbers less than 1 are an artifact of the underlying differential equation model. All ODE models (and the discrete time model we consider here) have that problem. A ‘hacky’ solution is to monitor the simulation and if a quantity drops below 1, set it to 0 or stop the simulation. Some apps use this approach. A cleaner solution is to treat all variables as discrete units and allow them to only change in integer steps (in a probabilistic manner). This approach will be discussed in the apps under the Stochastic models section. For most apps, neither approach is taken, so you’ll see numbers less than 1. That’s ok for the purpose we use the apps here. Just be careful if you use these kinds of models in your research you might need to pay attention to this issue.
While the list of tasks for each app always states what time units you should run the model in, this and most other simulations/apps in DSAIDE do not have natural time units (unless specifically stated). You could, therefore, assume that your model runs in units of days or weeks/months/years, based on what's most suitable for the disease you want to study. You have to make sure that all your parameters are in the right time units. Always make sure to check if a given simulation can handle different time units or assumes specific ones. Again, for the tasks the time units are specified.
The app lets you choose the maximum simulation time. Some apps also let you set the starting time and time step. Starting time generally does not matter and is almost always set to 0. Sometimes an app doesn't even allow you to change this. For models like that are based on differential equations, the time step does not affect the internal time steps taken by the numeric solver, only the times for which results are returned. Thus, for some apps, it is also not an input. It is here, so it can run together with a discrete-time model. Only for discrete-time models is this the actual time step that's being taken.
The tasks below are described in a way that assumes everything is in units of days (rate parameters, therefore, have units of inverse days). If any quantity is not given in those units, you need to convert it first (e.g. if it says a week, you need to convert it to 7 days).
# save all tasks, outcomes, etc. into an R data frame, then print later. # this data frame is used to automate shiny solutions and quiz generation # Explanation for each of the columns in the R data frame # QuizID: Used by the grading app. naming structure "dsaide_appid" # AppTitle: Title used for the app in the dsaide app, from appsettings$apptitle # AppID: App ID (short version of title), from appsettings$appid # TaskID: Number for each task the text belongs to # TaskText: The text that explains what to do for the task # RecordID: Identifies if it is the first, second,... item to record, labeling is T1R1, T1R2, etc. # Record: Text explaining what value from the model that nee too be recorded # Type: type of entry: # Character (single letter - for multiple choice), # Text (any text - should be avoided, hard to match/grade), # Numeric # Rounded_Integer # Rounded_Numeric # Integer # Logical - could be Yes/No or True/False. Will be evaluated the same in grading app. # None - if nothing is to be reported for a given task # Note: Used by students taking quiz. Makes it clear what type of value to enter in "Answers" #this is the running counter for the records which starts at 1 rc=1 #empty object, will hold all outcomes alloutcomes = NULL # Task 1 tid = 1 tasktext = "Start with 1000 susceptible individuals and 1 initially infected host, no recovered. Set simulation duration to 100 days, start time and time step can remain at 0 and 0.1. Set recovery rate _g_=0.5 per day, and infectiousness _b_=0.001. You will get an outbreak of some - currently unspecified - infectious disease. If you did it correctly, your outbreak should end with around 203 uninfected individuals still remaining. Also take a look at the final number of infected. You'll see that it is at _3E-9_, i.e. much less than 1 but not 0. That's the issue discussed above in the notes, that for ODE models, numbers can drop below 1, and then can slowly go to 0 but not fully reach 0. Often, this biologically unreasonable feature of these kinds of models can be ignored, but sometimes it matters. You will learn more about this when you work through the stochastic model apps." # Record for task 1 nrec = 1 # number of items to record out_records = c("Total number of recovered at end of simulation") out_types = rep("Rounded_Integer",nrec) out_notes = rep("Report the rounded integer",nrec) outcomes = data.frame( TaskID = rep(tid,nrec), TaskText = rep(tasktext,nrec), RecordID = paste0('T',tid,'R',(1:nrec)), Record = out_records, Type = out_types, Note = out_notes) alloutcomes = rbind(alloutcomes,outcomes) rc = rc + nrec #increment record counter by number of outcomes to record for this task # Task 2 tid = 2 tasktext = "Switch the plotting to have x-axis, y-axis or both plotted on a log scale. Leave all other settings as before. Note that while the look of the plot changes a lot, nothing has changed about the underlying simulation. The results are exactly the same in each case, only plotted differently. This is something to be aware of when you see plots in papers or produce your own. The best plot to use is the one that shows results of interest in the clearest form. Usually, the x-axis is linear and the y-axis is either linear or logarithmic. Try again switching between ggplot and plotly as plotting engine. Play around a bit with plotly. It has a number of interactive features, such as zooming, reading off values from a graph, turning on/off specific lines, etc. Those features will likely be useful as you go through the different apps in DSAIDE, thus I encourage you to explore the plotly functionality somewhat." # Record for task 2 nrec = 1 # number of items to record out_records = c("Nothing") out_types = rep("None",nrec) out_notes = c("") outcomes = data.frame( TaskID = rep(tid,nrec), TaskText = rep(tasktext,nrec), RecordID = paste0('T',tid,'R',(1:nrec)), Record = out_records, Type = out_types, Note = out_notes) alloutcomes = rbind(alloutcomes,outcomes) rc = rc + nrec #increment record counter by number of outcomes to record for this task ### Task 3 tid = 3 tasktext = "From the graph, get a (rough) estimate of the day at which the outbreak peaks. That's easiest done when you make a plotly plot. Contemplate the fact that the outbreak ends even though there are still susceptible remaining, i.e. not everyone got infected. Do you find that surprising? How could you maybe explain that? This topic is discussed in more detail in the reproductive number app. You have no infected left at the end of the outbreak. Figure out how you can use the information from either the susceptibles or recovered left at the end of the outbreak to figure out how many people in total got infected during the outbreak. To that end, think about how individuals move through the different compartments in this model. What can happen to those in the S/I/R compartments as the outbreak progresses? Where are they at the end of the outbreak? Use either information about the number of individuals in S or R at the beginning and end of the outbreak to determine the total (cumulative) number of individuals that got infected during the outbreak. Rerun the simulation, with the same values you just had. Does anything change? Why (not)?" # Record for task 3 nrec = 2 # number of items to record out_records = c("Day the outbreak peaked", "Total/cumulative number infected (compute as described in text)") out_types = rep("Rounded_Integer",nrec) out_notes = rep("Report the rounded integer",nrec) outcomes = data.frame( TaskID = rep(tid,nrec), TaskText = rep(tasktext,nrec), RecordID = paste0('T',tid,'R',(1:nrec)), Record = out_records, Type = out_types, Note = out_notes) alloutcomes = rbind(alloutcomes,outcomes) rc = rc + nrec ### Task 4 tid = 4 tasktext = "Set _Models to run_ to _both_, which runs and shows both the continuous-time and discrete-time models. Start with a discrete-time step of 0.01. Leave all other settings as before. Run the simulation, see what you get. You should see the results from the 2 models essentially on top of each other and barely distinguishable." # Record for task 4 nrec = 2 # number of items to record out_records = c("Number of susceptible at end of simulation for continuous-time model", "Number of susceptible at end of simulation for discrete-time model") out_types = rep("Rounded_Integer",nrec) out_notes = rep("Report the rounded integer",nrec) outcomes = data.frame( TaskID = rep(tid,nrec), TaskText = rep(tasktext,nrec), RecordID = paste0('T',tid,'R',(1:nrec)), Record = out_records, Type = out_types, Note = out_notes) alloutcomes = rbind(alloutcomes,outcomes) rc = rc + nrec ### Task 5 tid = 5 tasktext = "Now try different values for _dt_. Leave all other settings as before. You should notice that as _dt_ gets larger, the differences between discrete-time and continuous-time models increase. At some point when the time-step gets too large, the discrete-time simulation 'crashes' and you get an error message. This doesn't impact the continuous/ODE simulation since it chooses its time-step during the simulation internally and _dt_ only affects the times for which the results are returned." # Record for task 5 nrec = 2 # number of items to record out_records = c("Number of susceptible left at end of simulation for continuous-time model dt=1", "Number of susceptible left at end of simulation for discrete-time model dt=1") out_types = rep("Rounded_Integer",nrec) out_notes = rep("Report the rounded integer",nrec) outcomes = data.frame( TaskID = rep(tid,nrec), TaskText = rep(tasktext,nrec), RecordID = paste0('T',tid,'R',(1:nrec)), Record = out_records, Type = out_types, Note = out_notes) alloutcomes = rbind(alloutcomes,outcomes) rc = rc + nrec ### Task 6 tid = 6 tasktext = "Go back to running the ODE model only. Set all parameters as described in task 1. Then increase the rate of recovery by a factor of four, i.e. set it to g=2. (What duration of the infectious period does that translate to?) Contemplate what you expect to find for the increased recovery rate before you run the simulation. Then run the simulation. Compare your expectations with the results. How do they agree/disagree? Does it make sense? Anything surprising happening?" # Record for task 6 nrec = 3 # number of items to record out_records = c("Number of susceptible at end of simulation", "Number of recovered at end of simulation", "Do you see an outbreak? (Yes/No)") out_types = c("Rounded_Integer", "Rounded_Integer", "Logical") out_notes = c("Report the rounded integer", "Report the rounded integer", "Report Yes or No") outcomes = data.frame( TaskID = rep(tid,nrec), TaskText = rep(tasktext,nrec), RecordID = paste0('T',tid,'R',(1:nrec)), Record = out_records, Type = out_types, Note = out_notes) alloutcomes = rbind(alloutcomes,outcomes) rc = rc + nrec ### Task 7 tid = 7 tasktext = "Increase the transmission rate by a factor of 4, leave everything else as previously. How do you expect the results to change? (Try to make your prediction as precise/quantitative as you can.) Run the simulation with these new parameter settings. Compare your expectations with the results. How do they agree/disagree? Does it make sense? Compare the values for number susceptible/recovered at end of simulation and the peak of the outbreak to the values you found in the initial tasks. Hint: You should see that the _magnitude_ of the outbreak is essentially the same, but the _timing_ has changed." # Record for task 7 nrec = 4 # number of items to record out_records = c("Number of susceptible at end of simulation", "Number of recovered at end of simulation", "Do you see an outbreak? (Yes/No)", "Day at which outbreak peaked") out_types = c("Rounded_Integer", "Rounded_Integer", "Logical", "Rounded_Integer") out_notes = c("Report the rounded integer", "Report the rounded integer", "Report Yes or No", "Report the rounded integer") outcomes = data.frame( TaskID = rep(tid,nrec), TaskText = rep(tasktext,nrec), RecordID = paste0('T',tid,'R',(1:nrec)), Record = out_records, Type = out_types, Note = out_notes) alloutcomes = rbind(alloutcomes,outcomes) rc = rc + nrec ### Task 8 tid = 8 tasktext = "Double the number of susceptibles, _S~0~_, leave everything else as you just had. How do you expect the results to change? Run the simulation with these new parameter settings. Compare your expectations with the results. How do they agree/disagree? Does it make sense? Anything surprising happening?" # Record for task 8 nrec =4 # number of items to record out_records = c("Number of susceptible at end of simulation", "Number of recovered at end of simulation", "Do you see an outbreak? (Yes/No)", "Day at which outbreak peaked") out_types = c("Rounded_Integer", "Rounded_Integer", "Logical", "Rounded_Integer") out_notes = c("Report the rounded integer", "Report the rounded integer", "Report Yes or No", "Report the rounded integer") outcomes = data.frame( TaskID = rep(tid,nrec), TaskText = rep(tasktext,nrec), RecordID = paste0('T',tid,'R',(1:nrec)), Record = out_records, Type = out_types, Note = out_notes) alloutcomes = rbind(alloutcomes,outcomes) rc = rc + nrec # Task 9 tid = 9 tasktext = "Keep playing around with changing any of the parameters and starting conditions. Every time, think about what you expect to get, then run the simulation, compare your expectations with the results. Then make sense of it. What is the minimum and maximum number of outbreaks you can get? Why is that?" # Record for task 9 nrec = 2 # number of items to record out_records = c("Minimum number of outbreaks you can get", "Maximum number of outbreaks you can get") out_types = rep("Integer",nrec) out_notes = rep("Report an integer",nrec) outcomes = data.frame( TaskID = rep(tid,nrec), TaskText = rep(tasktext,nrec), RecordID = paste0('T',tid,'R',(1:nrec)), Record = out_records, Type = out_types, Note = out_notes) alloutcomes = rbind(alloutcomes,outcomes) rc = rc + nrec
#save the fully filled task table to a tsv file alloutcomes$QuizID = paste0(packagename,"_",appsettings$appid) alloutcomes$AppTitle = appsettings$apptitle alloutcomes$AppID = appsettings$appid #remove a few variables from the data frame savedoutcomes <- dplyr::select(alloutcomes,QuizID,AppID,AppTitle,TaskID,TaskText,RecordID,Record,Type,Note) write.table(savedoutcomes, paste0(appsettings$appid,"_tasktable.tsv"), append = FALSE, sep = "\t", row.names = F, col.names = TRUE)
# Take all the text stored in the table and print the tasks and items to record write_tasktext(alloutcomes)
r appsettings$simfunction[1]
and r appsettings$simfunction[2]
. You can call them directly, without going through the shiny app. Use the help()
command for more information on how to use the functions directly. If you go that route, you need to use the results returned from this function and produce useful output (such as a plot) yourself. vignette('DSAIDE')
into the R console.Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.