#load various variable definitions that are the same for each app
source('startup_script.R')
sapply(files_to_source, source) #source some helper files defined in the files_to_source variable
currentrmdfile = knitr::current_input()  #get current file name
appsettings = get_settings(currentrmdfile,appdocdir,packagename) #get settings for current app

Overview {#shinytab1}

This app allows exploration of the concept of uncertainty and sensitivity analysis. For this purpose, we use the Basic Bacteria Model introduced in the app of that name. If you haven't done so yet, familiarize yourself and work through that app first. It might also be a good idea to go through the Bacteria Model Exploration app to familiarize yourself further with the model and the idea of recording outcomes of interest as a function of changing parameters. Read about the model in The Model tab. Then, work through the tasks described in the What To Do tab.

Learning Objectives

The Model {#shinytab2}

Model Overview

The underlying model is the continuous-time, ODE version of the model from the Basic Bacteria Model app. Details can be found there.

For convenience, here is a quick summary:

We model 2 compartments:

We specify the following processes/flows:

  1. Bacteria grow/divide at some maximum rate (which we label g) and saturate as they approach some maximum carrying capacity, B~max~.
  2. Bacteria die at a natural death rate (which we label d~B~).
  3. Bacteria are killed by the immune response at some rate k.
  4. The immune response grows proportional to the number of bacteria and itself at some rate r.
  5. The immune response decays at some rate (which we label d~I~).

Model Diagram

The diagram illustrating this compartmental model is shown in the figure.

knitr::include_graphics(path = paste0("../media/",appsettings$modelfigname))

Model Equations

The model is implemented as a set of ordinary differential equations:

\begin{aligned} \dot B & = gB(1-\frac{B}{B_{max}})-d_B B - k BI \ \dot I & = r B I - d_I I \end{aligned}

Model Concepts

For the system we want to study and model, we often have only rough estimates for the model parameters and starting values. Instead of specifying fixed values (which results in a single time-series), we can instead specify parameter ranges, choose sets of parameter values from these ranges, and run the model for multiple sets of parameters.

The aggregate of all possible values for parameters, i.e., the parameter space, can be explored a couple different ways. One way is to explicitly specify values for all parameters and exhaustively run models using all combinations of the parameter values. This grid search is computationally expensive and becomes rather intractable given even relatively few parameters. For example, specifying ten values for each of the Basic Bacteria Model parameters leads to 10^8^ unique simulations to run.

Alternatively, we could explore the parameter space using probability distributions. The simplest way of of doing this is to set an upper and lower bound (based on what we know about the biology of the system) and randomly choose any value within those bounds with equal probability (i.e., sample from a uniform distribution). We can almost always set bounds even if we know very little about a system. Assume we want to model the death rate of some cell type (e.g., NK cells) in humans. We might not know anything, but we can still be fairly confident that their lifespan is at least 1 second and less than 100 years. That's of course a wide range and we should and usually can narrow ranges further, based on biological knowledge of a given system.

Instead of specifying a uniform distribution, we can choose other probability distributions that better represent our knowledge and understanding of the system. For example, if we are fairly certain that values are close to some quantity, then we can choose a distribution that peaks at some central value and minutely varies around it. A popular choice may be the normal distribution, but it is not ideal since it allow negative values, a characteristic that doesn't make sense for our parameters. The gamma distribution may be a better idea since it leads to only positive values.

Once appropriate probability distributions are chosen, we still need to sample from the distributions for each run of the simulation. We could simply draw samples completely randomly, but that would lead to inefficient sampling (i.e., the parameter space is not thoroughly explored). A smarter method exists known as Latin Hypercube sampling (LHS). It essentially ensures that we sample the full range of possible parameter combinations in an efficient manner. For more technical details, see e.g. [@saltelli04]. For this app, we use LHS.

To run the model for this app, we need to specify values for the 2 initial conditions, B~0~ and I~0~, and the 6 model parameters g, B~max~, d~B~, k, r, d~I~. All initial conditions and parameters are sampled uniformly between the specified upper and lower bound, apart from the bacteria growth rate, which is given by a gamma distribution with user-specified mean and variance. For this teaching app, there is no biological reason for making bacterial growth different, I just picked one parameter and decided to make it non-uniformly distributed to show you different ways one can implement distributions from which to draw parameter samples.

Once we specify the ranges for each parameter, the sampling method, and the number of samples, the simulation draws that many samples, runs the model for each sample, and records outcomes of interest. While the underlying simulation returns a time-series for each sample, we are usually not interested in the full time-series. Instead, we are interested in some summary quantity. For instance in this model, we might be interested in the maximum/peak level of bacteria during the infection, the level of bacteria at the end (the steady state) of the infection, and the level of the immune response at steady state. This app records and reports those 3 quantities as B~peak~, B~steady~ and I~steady~.

Results from such simulations for multiple samples can be analyzed in different ways. The most basic one, called Uncertainty Analysis only asks what level of uncertainty we have in our outcomes of interest given the amount of uncertainty in our model parameter values. This can be graphically represented with a boxplot and is one of the plot options for this app.

In a next step, we can ask 'how sensitive is the outcome(s) of interest to variation in specific parameters'; that part is the Sensitivity Analysis. When you run the simulations, you essentially do both uncertainty and sensitivity analyses at the same time, it's just a question of how you further process the results. We can graphically inspect the relation between outcome and some parameter with scatterplots. If we find that there is a monotone up or down (or neither) trend between parameter and outcome, we can also summarize the finding using a correlation coefficient. For this type of analysis, using the Spearman rank correlation coefficient is useful and the app calculates and shows this correlation coefficient below the figures.

Notes

What To Do {#shinytab3}

The model is assumed to run in units of days.

#this is the running counter for the records which starts at 1 
rc=1

#empty object, will hold all outcomes
alloutcomes = NULL

#########################
# Task 1
#########################
tid = 1
tasktext = "The creation of parameter samples involves some element of uncertainty, and we need to make use of random numbers. We still want results to be reproducible. That's where the random number seed comes in. As long as the seed is the same, the code should produce the same pseudo-random numbers each time, thus ensuring reproducibility. 

\nLet's explore this. Leave all settings at their default values. Run the simulation. You should find that the median for __B~steady~__ is around 10736 (as printed below the plot). Run the simulation again with the same settings, you should get exactly the same results. 

\nNow change the random number seed (rngseed) to 101. Run the simulation again. The median value should be lower. Note that the only thing that is different is that the computer drew different random samples for the parameter values. Keep exploring other rngseed values. You should notice that sometimes results change a bit, sometimes a lot, and that it doesn't matter if you change the seed by just a bit or a lot."
nrec = 1 # number of items to record
out_records = c("Median for __B~steady~__ with _rngseed_ = 101.")
out_types = rep("Rounded_Integer",nrec)
out_notes = rep("Report the rounded integer",nrec)
outcomes = data.frame( TaskID = rep(tid,nrec),
                       TaskText = rep(tasktext,nrec),
                      RecordID = paste0('T',tid,'R',(1:nrec)),
                      Record = out_records, 
                      Type = out_types, 
                      Note = out_notes)
alloutcomes = rbind(alloutcomes,outcomes)
rc = rc + nrec #increment record counter by number of outcomes to record for this task 

#########################
# Task 2
#########################
tid = tid + 1
tasktext = "The more samples you have, the more robust the results are to changes in the underlying sample generation (determined by the random number seed). Try checking this by running 100 samples, instead of 10, with the two random number seeds from above. Be patient, this might take a good while. Since each sample means a run of the underlying simulation model, things start to go slow for increased sample size. 

\nFor _rngseed_ = 100, the median of __B~steady~__ is around 9306, for the second _rngseed_ value it is somewhat higher. You might have noticed that the variability shrinks in the central quantities (mean, median) for the larger sample size, not much in the extremes (min/max). You might have also noticed that some __B~peak~__ is inherently more variable than __B~steady~__ and __I~steady~__. Also note the 'system might not have reached steady state' message. If for too many of the samples steady state has not been reached, the results for __B~steady~__ and __I~steady~__ are not correct. In that case you need to increase the simulation time to allow the system to settle into steady state. For some parameter combinations, that can take very long."
nrec = 1 # number of items to record
out_records = c("Median for __B~steady~__ with rngseed = 101, 100 samples.")
out_types = rep("Rounded_Integer",nrec)
out_notes = rep("Report the rounded integer",nrec)
outcomes = data.frame( TaskID = rep(tid,nrec),
                       TaskText = rep(tasktext,nrec),
                      RecordID = paste0('T',tid,'R',(1:nrec)),
                      Record = out_records, 
                      Type = out_types, 
                      Note = out_notes)
alloutcomes = rbind(alloutcomes,outcomes)
rc = rc + nrec #increment record counter by number of outcomes to record for this task 

#########################
# Task 3
#########################
tid = tid + 1
tasktext = "Recall the underlying model and its behavior. If you can't, revisit the ___Basic Bacteria Model___ app. Use your understanding of the model to predict what happens if you increase both lower and upper bound for the immune response activation rate, _r_. Increase both lower and upper bounds (_rmin_ and _rmax_) by a factor of 10. Set seed to 100, with 50 samples. Leave everything else at the default values. Run the simulations and see how results change. 

\nNow go the opposite direction, lower the initial lower/upper bounds by a factor of 10 from their original value. Run simulations, see how results change. Especially for the steady states, you know from previous apps how things should change as you change _r_. Compare your simulation results with your expectations."
nrec = 1 # number of items to record
out_records = c("Median for __B~steady~__, _rmin_ = 0.001 and _rmax_ = 0.002.")
out_types = rep("Rounded_Integer",nrec)
out_notes = rep("Report the rounded integer",nrec)
outcomes = data.frame( TaskID = rep(tid,nrec),
                       TaskText = rep(tasktext,nrec),
                      RecordID = paste0('T',tid,'R',(1:nrec)),
                      Record = out_records, 
                      Type = out_types, 
                      Note = out_notes)
alloutcomes = rbind(alloutcomes,outcomes)
rc = rc + nrec #increment record counter by number of outcomes to record for this task 


#########################
# Task 4
#########################
tid = tid + 1
tasktext = "Continue this exploration as you change ranges for different parameters. Note that while you change ranges for some parameters, every time you get samples for all parameters in those ranges. This is different from the ___Basic Bacteria Exploration___ app where only the parameter we scanned over changed and every other parameter remainded fixed. Usually, you have a parameter you are most interested in, and you'll explore that parameter in detail, while allowing for variability in the others. As you play around, it is likely that for some settings you'll see warning or error messages on the `R` console. That generally means that the parameters for a given simulation are such that the differential equation solver can't properly run the model. That usually corresponds to biologically unrealistic parameter settings. We'll ignore them, but if you did a research project and you got such warning or error messages, you'd have to figure out why you get them and only once you fully understand why is it maybe ok to ignore them."
nrec = 1 # number of items to record
out_records = c("Nothing")
out_types = rep("None",nrec)
out_notes = c("")
outcomes = data.frame( TaskID = rep(tid,nrec),
                       TaskText = rep(tasktext,nrec),
                      RecordID = paste0('T',tid,'R',(1:nrec)),
                      Record = out_records, 
                      Type = out_types, 
                      Note = out_notes)
alloutcomes = rbind(alloutcomes,outcomes)
rc = rc + nrec #increment record counter by number of outcomes to record for this task 


#########################
# Task 5
#########################
tid = tid + 1
tasktext = "So far, we only looked at the total variability in outcomes as parameters are being varied/sampled (i.e. the boxplots and summary measures printed underneath the plots). The above approach of exploring the impact of a parameter on results by varying bounds and repeatedly looking at boxplots or summary measures is somewhat tedious. This is where sensitivity analysis comes in. By running the same simulations, but plotting outcomes as scatterplots, we can explore how variation in a paramter leads to variation in outcomes. Let's explore this. 

\nSet all input values to their defaults. Then set sample size to 50 and switch the plot type from boxplot to scatterplot. Run the simulation. You'll see how the 3 outcomes of interest vary with one of the varied inputs. Here this is the initial starting value for the bacteria. Which in some sense is not a model parameter, but an initial condition. However, for the purpose of U/S analysis, either is considered a varied input/parameter. Also look at the text below the plots. For each parameter-output pair, the code computes a rank correlation coefficient. Numbers close to 0 mean there is essentially no correlation, close to 1 or -1 means a large positive or negative correlation. (One could compute p-values for these correlations, but they are somewhat meaningless since the values will get smaller the more samples you use, so you can basically produce any p-value you want.). 

\nLet's explore this correlation for a parameter we are familiar with from previous apps, the bacteria growth rate. Set all inputs to their defaults, then set samples to 100. Also switch the plot type to _Scatterplot_ and _Parameter for scatterplot_ to _g_. Run the simulation. We found earlier that __B__ at steady state does not depend on _g_, while __I__ does. The correlation coefficients show this as well. Note that they are not exactly 0 and 1 since there is additional variability due to the other parameters being sampled as well."
nrec = 1 # number of items to record
out_records = c("Rank Correlation Coefficient between _g_ and __I~steady~__ (2 decimal places).")
out_types = rep("Rounded_Numeric",nrec)
out_notes = rep("Report the value rounded to two significant digits",nrec)
outcomes = data.frame( TaskID = rep(tid,nrec),
                       TaskText = rep(tasktext,nrec),
                      RecordID = paste0('T',tid,'R',(1:nrec)),
                      Record = out_records, 
                      Type = out_types, 
                      Note = out_notes)
alloutcomes = rbind(alloutcomes,outcomes)
rc = rc + nrec #increment record counter by number of outcomes to record for this task 


#########################
# Task 6
#########################
tid = tid + 1
tasktext = "Let's explore in more detail how variability in different parameters impacts results. We'll pretend that we happen to know several model inputs with certainty. To do so, we'll impose no variability for some parameters. For the following parameters, set *both* their lower and upper bound to the specified value: initial value for bacteria and immune response = 1,  _B~max~_ = 1e5,  _d~B~_ = 1, _k_ = _r_ = 10^-4^. Let _d~I~_ vary between 1 and 2 and give _g_ a mean of 5 and variance of 1. 100 samples, seed at 100, scatterplot for _g_. Run the simulation. You'll see a very clear pattern relating the outcomes to _g_. Switch to plotting _d~I~_, you'll again see nice patterns. 

\nConfirm that, as expected from the steady state equations, __I~steady~__ depends positively on _g_ and negatively on _d~I~_. Both the scatterplots and the correlation coefficient should give you that information. Also note the distribution for _g_ and _d~I~_. The former has more points around its mean and less for lower/higher values, while _d~I~_ values are uniformly distributed along the x-axis. This comes from the underlying assumption about how the parameters are distributed, gamma distribution versus uniform distribution."
nrec = 1 # number of items to record
out_records = c("Rank Correlation Coefficient between _g_ and __I~steady~__ (2 decimal places).")
out_types = rep("Rounded_Numeric",nrec)
out_notes = rep("Report the value rounded to two significant digits",nrec)
outcomes = data.frame( TaskID = rep(tid,nrec),
                       TaskText = rep(tasktext,nrec),
                      RecordID = paste0('T',tid,'R',(1:nrec)),
                      Record = out_records, 
                      Type = out_types, 
                      Note = out_notes)
alloutcomes = rbind(alloutcomes,outcomes)
rc = rc + nrec #increment record counter by number of outcomes to record for this task 

#########################
# Task 7
#########################
tid = tid + 1
tasktext = "In the ___Bacteria Model Exploration___ app, we investigated the fact that for some value of _d~I~_, we get a switch from a steady state to no steady state (if you don't remember, I suggest you revisit that app first). This behavior can also (accidentally) happen when you sample over parameters. That's not ideal since it means you combine to qualitatively different results in a single distribution. Let's explore this. 

\nReset all values to their defaults. Then, set ranges for _d~I~_ to be from 1 to 5, 100 samples, a scatterplot with _d~I~_ on the x-axis. Also change to plotly as the plot engine. Run the simulation. You should see mostly non-zero values for __I~steady~__ (you might need to click the auto-scale button on the plot). Now change the range for _d~I~_ to be from 10 to 14. You'll now see a large number of zeros for __I~steady~__, especially for high _d~I~_ values. 

\nThe important take-home message from this is that the influence of a parameter on some outcome can be different over different ranges. For instance in range A-B, the parameter might have a major influence, but once the parameter value goes above B, the parameter does not further influence the result. If you have large uncertainty in your parameters, it might be worth considering both the full range, and dividing the range into smaller areas to see how the parameter behaves."
nrec = 1 # number of items to record
out_records = c("Rank Correlation Coefficient between _d~I~_ and __I~steady~_ for range 10-14 (2 decimal places).")
out_types = rep("Rounded_Numeric",nrec)
out_notes = rep("Report the value rounded to two significant digits",nrec)
outcomes = data.frame( TaskID = rep(tid,nrec),
                       TaskText = rep(tasktext,nrec),
                      RecordID = paste0('T',tid,'R',(1:nrec)),
                      Record = out_records, 
                      Type = out_types, 
                      Note = out_notes)
alloutcomes = rbind(alloutcomes,outcomes)
rc = rc + nrec #increment record counter by number of outcomes to record for this task 
#save the fully filled task table to a tsv file
alloutcomes$QuizID = paste0(packagename,"_",appsettings$appid)
alloutcomes$AppTitle = appsettings$apptitle
alloutcomes$AppID = appsettings$appid
#remove a few variables from the data frame
savedoutcomes <- dplyr::select(alloutcomes,QuizID,AppID,AppTitle,TaskID,TaskText,RecordID,Record,Type,Note)     
write.table(savedoutcomes, paste0(appsettings$appid,"_tasktable.tsv"), append = FALSE, sep = "\t", row.names = F, col.names = TRUE)
# Take all the text stored in the table and print the tasks and items to record
write_tasktext(alloutcomes)

Further Information {#shinytab4}

For this app, the underlying function running the simulation is called r appsettings$simfunction. That function repeatedly calls r appsettings$underlying_function.

This app (and all others) are structured such that the Shiny part (the graphical interface you are using) calls one or several underlying R functions which run the simulation for the model of interest and return the results. You can call them directly, without going through the shiny app. Use the help() command for more information on how to use the functions directly. If you go that route, you need to use the results returned from this function and produce useful output (such as a plot) yourself.

You can also download all simulator functions and modify them for your own purposes. Of course to modify these functions, you'll need to do some coding.

For examples on using the simulators directly and how to modify them, read the package vignette by typing vignette('DSAIRM') into the R console.

Good papers explaining uncertainty and sensitivity analysis in a bit more detail are [@hoare08; @marino08].

References



ahgroup/DSAIRM documentation built on Oct. 4, 2023, 6:50 a.m.