#load various variable definitions that are the same for each app
source('startup_script.R')
sapply(files_to_source, source) #source some helper files defined in the files_to_source variable
currentrmdfile = knitr::current_input()  #get current file name
appsettings = get_settings(currentrmdfile,appdocdir,packagename) #get settings for current app

Overview {#shinytab1}

This app fits a simple drug treatment model to some influenza data. This aims to reinforce understanding of model comparisons in the context of fitting to data. While this app does model fitting, the topic of fitting is not discussed here. To learn more about fitting, you should go through the Basic Virus Model Fitting app (and if you want, the other ones in the Model Fitting Topics section.)

Read about the model in The Model tab. Then, work through the tasks described in the What To Do tab. Learn more about the model and its origins in the Further Information tab.

Learning Objectives

The Model {#shinytab2}

Model Overview

This app uses the same model and follows the same overall setup as the Antiviral Treatment Model app. You should go through that app first before exploring this one; there, you can find a more detailed description of the model. The major difference for this app is that, here, instead of comparing model results to data in a qualitative manner, we fit the model to data. This provides a more rigorous, statistical way of comparing models and thus hypotheses.

Model Diagram

The model is represented by this flow diagram:

knitr::include_graphics(path = paste0("../media/",appsettings$modelfigname))

Model Equations

The model is implemented as a set of ordinary differential equations:

\begin{align} \dot U & = n - d_U U - (1-f)bUV \ \dot I & = (1-f)bUV - d_I I \ \dot V & = (1-e)pI - d_V V - (1-f)gbUV \end{align}

Model Concepts

Data

For this app, viral load data from volunteers who were infected with influenza is being fit. The data is average log viral titer on days 1-8 post infection. The data comes from [@hayden96]. Three treatment conditions are shown. One group of individuals did not receive treatment, one group received drug treatment (oseltamivir) early (around 24 hours post infection) and one group late (around 48 hours).

Drug Treatment

The setup for this model is a little bit different from the Antiviral Treatment Model. Instead of supplying values for both e or f, this app is implemented such that there is only one parameter, called k here, that sets the strength of the drug (between 0 and 1). Another input can be set to switch between model 1 (reduction of cell infection, i.e., k -> f in the model) and model 2 (reduction of virus production, i.e., k -> e in the model). For either model, the value provided for k is only a starting value. The fit to the data determines the value which best describes the data.

Fitting Approach

This app fits the log viral titer of the data to the virus kinetics produced by the model simulation. The fit is evaluated by computing the sum of square residuals / errors (SSR / SSE) between data and model for all data points.

Model parameters that are being fit are b, p, g and k (which maps to either e or f depending on the model that's run). The other parameters are assumed to be known and kept fixed. Otherwise, given the sparse amount of data, the model would likely lead to overfitting.

For this app, the simulation is being run three times. During one run, treatment (parameters f or e in the model) are not turned on. This simulation is fit to the no-treatment data. Similarly, treatment is turned on by setting the treatment parameters to non-zero values early or late, and fit to the data for the corresponding scenario. The results combine all three simulations to all the data for the three treatment conditions giving an overall fit.

Notes

What to do {#shinytab3}

This simulation can take time to run, so be patient. The tasks below are described in a way that assumes everything is in units of days.

#this is the running counter for the records which starts at 1 
rc=1

#empty object, will hold all outcomes
alloutcomes = NULL

#########################
# Task 1
#########################
tid = 1
tasktext = "Set the number of uninfected cells to 1e7, 1 virion, and no infected cells at start. Assume that infected cells have an average lifespan of 12 hours and virus of 6 hours. Those parameters are fixed and not fitted. Set virus production rate and infection rate both to 0.002, conversion factor to _g=0_. Set upper and lower bounds for each of these parameters to 0 and 100. For a discussion on parameter bounds, see some of the other fitting apps. The bounds for the treatment strength are hard coded to be 0 and 1. 

\nStart with the drug turned off (k=0). Run 1 iteration of model 1. I suggest you choose a log scale for the y-axis and plotly as the plot engine. Using plotly allows you to click on the rather busy figure and turn on/off specific model components. You should see all 3 curves (no treatment, early treatment, late treatment) on top of each other and an SSR of around 39. 

\nNow set drug efficacy to 1, run model 1 for 1 iteration, and then repeat for model 2. You should cleary see the impact of the drug on the virus load curves. You'll find an SSR a bit above 7 for model 1 and a slightly lower SSR and AIC for model 2 (as you learn in another fitting app, if the models have the same number of parameters, which is the case here, SSR and AIC move together and contain the same information, namely lower value means a better fitting model).

\nNow, fit model 1 for 500 iterations. Be patient, this might take a little while. SSR will have decreased to a bit over 4. Take a look at the numbers below the plot. Those are the estimated values for the parameters at the final iteration of the model (this might or might not be the overall best fit, we'll get back to this). For now, focus on the value for drug efficacy (_f_ for model 1). Remember that for the input, you set drug efficacy with the parameter _k_, which then is mapped to either _f_ or _e_ based on the model. Therefore you will see the best reported for either _f_ or _e_, based on the model you run. You should find it to be 1, i.e. a perfect drug. Next, fit model 2 for 500 iterations. You'll find that it performs slightly better than model 1, and the best fit estimate for the drug efficacy is again 1.

\nIf your computer is fast enough, you can increase to 1000 or more iterations for both models. Results should not change. Based on this, we might conclude that the mechanism of the drug preventing virus production is more likely, given the data and model, and that our estimate for the drug efficacy is that it's a perfect drug. However, as we'll see next, this is not quite a robust result."
nrec = 4 # number of items to record
out_records = c("SSR for model 1, _k_ = 1, 1 iteration",
                "SSR for model 2, _k_ = 1, 1 iteration",
                "SSR for model 1, _k_ = 1, 500 iterations",
                "SSR for model 2, _k_ = 1, 500 iterations")
out_types = rep("Rounded_Numeric",nrec)
out_notes = rep("Round to two significant digits",nrec)
outcomes = data.frame( TaskID = rep(tid,nrec),
                       TaskText = rep(tasktext,nrec),
                      RecordID = paste0('T',tid,'R',(1:nrec)),
                      Record = out_records, 
                      Type = out_types, 
                      Note = out_notes)
alloutcomes = rbind(alloutcomes,outcomes)
rc = rc + nrec #increment record counter by number of outcomes to record for this task 


#########################
# Task 2
#########################
tid = tid + 1
tasktext = "Leave everything as you just had, but now chose the starting value for drug efficacy to be _k=0.5_. Note that this is only a starting value; _k_ is being fit thus, in theory, we should get the same best fit model and best fit estimate for _k_, no matter where we start. Test that by running models 1 and 2 for 500 iterations with this changed starting value of _k_. You'll find that the final iteration for model 1 produces a worse SSR and a different estimate for drug efficacy (compared to starting at _k_ = 1), while model 2 leads to a better fit than before, and also a reduced drug efficacy. If your computer is fast enough, you can again increase the number of iterations. You'll find that the results don't change."
nrec = 4 # number of items to record
out_records = c("SSR for model 1, _k_ = 0.5, 500 iterations",
                "Drug efficacy estimate for model 1",
                "SSR for model 2, _k_ = 0.5, 500 iterations",
                "Drug efficacy estimate for model 2")
out_types = rep("Rounded_Numeric",nrec)
out_notes = rep("Round to two significant digits",nrec)
outcomes = data.frame( TaskID = rep(tid,nrec),
                       TaskText = rep(tasktext,nrec),
                      RecordID = paste0('T',tid,'R',(1:nrec)),
                      Record = out_records, 
                      Type = out_types, 
                      Note = out_notes)
alloutcomes = rbind(alloutcomes,outcomes)
rc = rc + nrec #increment record counter by number of outcomes to record for this task 


#########################
# Task 3
#########################
tid = tid + 1
tasktext = "The finding from the previous task shows another example of the practical difficulty of obtaining the best fit. As discussed in the other apps, for a real research problem, we would need to try different solvers/optimizers, different starting values for the fitted parameters, and fitting to simulated data. This might give us some confidence that we are not overfitting and that our results are robust. If we do that, we might be able to trust our finding, which will tell us which model (and thus mechanism) is more consistent with the data and what the estimate for the drug efficacy is. You can explore this to see what the best fit is you can find for different starting values of the fitted parameters, and different models and iterations. (I found one for model 2 with SSR = 3.0. There might be even better ones for either model.)"
nrec = 1 # number of items to record
out_records = c("Nothing")
out_types = rep("None",nrec)
out_notes = c("")
outcomes = data.frame( TaskID = rep(tid,nrec),
                       TaskText = rep(tasktext,nrec),
                      RecordID = paste0('T',tid,'R',(1:nrec)),
                      Record = out_records, 
                      Type = out_types, 
                      Note = out_notes)
alloutcomes = rbind(alloutcomes,outcomes)
rc = rc + nrec #increment record counter by number of outcomes to record for this task 

#########################
# Task 4
#########################
tid = tid + 1
tasktext = "Note that changing starting values of the fitted parameters as you did in the previous task should, in theory, not change the best fit. If you change values for the fixed parameters, you change your assumptions about the biological problem and, therefore, results are expected to be different. Let's test that briefly. 

\nReset inputs to default values. That should produce an average lifespan for infected cells of 24 hours and for virus of 12 hours (twice what we used above). Set drug efficacy to _k = 1_ and run both models 1 and 2 for one iteration. You'll see that the starting values are worse compared to those in the first task. Remember, starting values are not that meaningful, what matters is the final fit result. Run both models 1 and 2 for 500 iterations.

\nYou should find an SSR somewhat above 7 for model 1 and above 4 for model 2. You can try more iterations and different starting values for the fitted parameters to see if you can find better fits. I explored a bit (not exhaustively) and found an SSR of 4.03 for model 2, model 1 always results in a higher value. So, it seems if we change the underlying assumptions of the lifespan of infected cells and virus (which needs to come from external knowledge), model 2 is still better for this specific setup and estimates of drug efficacy are similar. This is not generally the case. If the biological setup changes, model performance can change. All model parameters (and the model structure) need to either be justified based on biological knowledge or through fitting to data. Usually, it is a combination of both."
nrec = 2 # number of items to record
out_records = c("SSR for model 1, _k_ = 1, 500 iterations",
                "SSR for model 2, _k_ = 1, 500 iterations")
out_types = rep("Rounded_Numeric",nrec)
out_notes = rep("Round to two significant digits",nrec)
outcomes = data.frame( TaskID = rep(tid,nrec),
                       TaskText = rep(tasktext,nrec),
                      RecordID = paste0('T',tid,'R',(1:nrec)),
                      Record = out_records, 
                      Type = out_types, 
                      Note = out_notes)
alloutcomes = rbind(alloutcomes,outcomes)
rc = rc + nrec #increment record counter by number of outcomes to record for this task 
#save the fully filled task table to a tsv file
alloutcomes$QuizID = paste0(packagename,"_",appsettings$appid)
alloutcomes$AppTitle = appsettings$apptitle
alloutcomes$AppID = appsettings$appid
#remove a few variables from the data frame
savedoutcomes <- dplyr::select(alloutcomes,QuizID,AppID,AppTitle,TaskID,TaskText,RecordID,Record,Type,Note)     
write.table(savedoutcomes, paste0(appsettings$appid,"_tasktable.tsv"), append = FALSE, sep = "\t", row.names = F, col.names = TRUE)
# Take all the text stored in the table and print the tasks and items to record
write_tasktext(alloutcomes)

Further Information {#shinytab4}

For this app, the underlying function running the simulation is called r appsettings$simfunction. That function repeatedly calls r appsettings$underlying_function.

This app (and all others) are structured such that the Shiny part (the graphical interface you are using) calls one or several underlying R functions which run the simulation for the model of interest and return the results. You can call them directly, without going through the shiny app. Use the help() command for more information on how to use the functions directly. If you go that route, you need to use the results returned from this function and produce useful output (such as a plot) yourself.

You can also download all simulator functions and modify them for your own purposes. Of course to modify these functions, you'll need to do some coding.

For examples on using the simulators directly and how to modify them, read the package vignette by typing vignette('DSAIRM') into the R console.

A model like this is used in [@handel20] to illustrate one of the ways models can be used. The same data and similar models were also fit in [@handel07].

References



ahgroup/DSAIRM documentation built on Oct. 4, 2023, 6:50 a.m.