#load various variable definitions that are the same for each app source('startup_script.R') sapply(files_to_source, source) #source some helper files defined in the files_to_source variable currentrmdfile = knitr::current_input() #get current file name appsettings = get_settings(currentrmdfile,appdocdir,packagename) #get settings for current app
This app shows some basic fitting of data to 2 simple infection models. This shows the concept of model/hypothesis testing. Read about the model in The Model tab. Then, work through the tasks described in the What To Do tab.
This app fits 2 different models to virus load data from humans. The idea is that by comparing both models to the data, we can determine which one fits the data better, and, therefore, tentatively suggest that the mechanisms which are part of the better-fitting model might be more realistic descriptions of the underlying biological system.
This model consists of 4 compartments. The following entities are modeled:
The following processes/flows are included in the model:
This model also consists of 4 compartments. The following entities are modeled:
The following processes/flows are included in the model:
The diagram illustrates both compartmental models, with colors to indicate mechanisms that are part of either model 1 or model 2.
knitr::include_graphics(path = paste0("../media/",appsettings$modelfigname))
Implementing the models as continuous-time, deterministic systems leads to the following set of ordinary differential equations for model 1:
$$ \begin{aligned} \dot U & = - bUV \ \dot I & = bUV - d_I I - kIX \ \dot V & = pI - d_V V - gbUV \ \dot X & = a V + r X \end{aligned} $$
Model 2 is described by these equations: $$ \begin{aligned} \dot U & = - bUV \ \dot I & = bUV - d_I I \ \dot V & = pI - d_V V - k V X - gbUV \ \dot X & = a V X - d_X X \end{aligned} $$
The data being used in this app comes from [@hayden96]. Specifically, the data being fit is the 'no intervention' viral load data shown in Figure 2 of the paper.
The basic fitting approach is the one described in the Basic Virus Model Fitting app, i.e., we minimize the sum of squares of the log differences between model predictions and virus load data, taking censored values into account.
There are different ways to evaluate how well a model fits to data, and to compare between different models. This app shows the approach of using Akaike's "An Information Criterion" (AIC), or more precisely, we'll use the one corrected for small sample size, AICc. If we fit by minimizing the sum of squares (SSR), as we do here, the formula for the AICc is:
$$ AICc = N \log(\frac{SSR}{N})+2(K+1)+\frac{2(K+1)(K+2)}{N-K} $$
where N is the number of data points, SSR is the sum of squares residual at the final fit, and K is the number of parameters being fit. A lower value for AICc means a better model.
This combination of SSR (or more generally, some measure of goodness of fit) with the number of parameters (or more generally, model complexity) tries to find a trade-off between how well a model fits and how big it is. Generally speaking, bigger models fit better. But, this can lead to overfitting. We want to find a good balance between model complexity and fit quality. AIC is one way to do that. For AIC, it means a bigger model with lower SSR might still have a higher AIC since model complexity gets penalized. (In our case, since both models have the same number of parameters, the two quantities move in the same direction.)
One nice feature of the AIC is that one can compare as many models as one wants without having issues with p-values or correcting for multiple comparison, and the models do not need to be nested (i.e., each smaller model being contained inside the larger models / a criterion for likelihood ratio tests). That said, AIC has its drawbacks. Nowadays, if one has enough data available, the best approach is to evaluate model performance by a method like cross-validation [@hastie11], and/or if available, building models with one dataset and testing them on a different dataset.
For evaluation of models with different AIC, there is fortunately no arbitrary, magic value (e.g., p=0.05). A rule of thumb is that if models differ by AIC of more than 2, the one with the smaller one is considered statistically better supported (don't use the word significant since that is usually associated with a p-value<0.05, which we don't have here). I tend to be more conservative and want AIC differences to be at least >10 before I'm willing to favor a given model. Also, I think that visual inspection of the fits is useful. If one model has a lower AIC, but the fit doesn't look that convincing biologically (e.g., very steep increases or decreases in some quantity), I'd be careful drawing very strong conclusions.
In the second model, B-cells grow proportional to both virus and B-cells at rate a, and die at rate d~X~. They remove virus at rate k.
Both models are rather simple and, arguably, somewhat unrealistic.
For both models, we ignored birth/death of uninfected cells, which is possibly okay for an acute infection (but needs to be determined based on known biology for a given system).
In general, models need to be carefully tailored to the question and setting. This is true for models used in fitting even more than for models that are used without trying to perform data fitting. The more data you have available, the more complex a model you can usually justify (as long as you know enough about the system to postulate reasonable hypothetical mechanisms/processes).
The absolute value of the AIC is unimportant and varies from dataset to dataset. Only relative differences matter. And it goes without saying that we can only compare models that are fit to exactly the same data.
The tasks below are described in a way that assumes everything is in units of days (rate parameters, therefore, have units of inverse days). If any quantity is not given in those units, you need to convert it first (e.g. if it says a week, you need to convert it to 7 days).
#this is the running counter for the records which starts at 1 rc=1 #empty object, will hold all outcomes alloutcomes = NULL ######################### # Task 1 ######################### tid = 1 tasktext = "Take a look at the inputs and outputs for the app. It is similar to the previous fitting apps. To keep things simple, the confidece interval computation part is not present. Each model has 3 parameters which are being estimated (_a_, _b_ and and _r_ for model 1 and _a_, _b_ and _dX_ for model 2). You might wonder why the other parameters are fixed, especially since in previous examples we fitted parameters such as _p_ and _dV_. The only reason here is to keep things simple, and given the small amount of data, it is not possible to estimate more parameters. For a real problem, you would need good outside scientific knowledge to justify fixing some parameters (e.g., those were measured fairly precisely in separate experiments). Otherwise, you would need to fit those parameters, or, if you don't have enough data (i.e., you might be overfitting), you'll have to simplify your model. For this example, we'll just assume we know the other parameters. Run model 1 for 1 iteration with the default settings (you might want to plot the result with a log y-axis). You should get an SSR of 5.8 and an AICc of 10.7. Now switch to model 2, also run for 1 step. Both SSR and AICc are a bit lower. Those starting values don't really mean anything, it's just an initial check that model and data are reasonably close and we can start fitting. Now run model 1 for 500 iterations. AICc should have dropped to around -7.08. Do the same 500 iterations for model 2. You'll find it fits better, with a lower AICc (remember, the actual value of the AICc doesn't matter, all you can do is compare differences in AICc between models fit to the same data, and lower is better). If your computer allows, you can run more iterations for each model to check that these results hold. In this example, the model with the lower AICc also have the lower SSR. Why is that so? Is that always true for any 2 models? (It's not: The 2 models here are in some sense special, understand how/why.) If you have a hard time figuring this out, I suggest you take the AICc equation and compute it manually for both models by inserting SSR and K (number of parameters) and data points (here N = 8). " nrec = 2 # number of items to record out_records = c("AICc for model 2, 500 interation (round to 2 digits).", "TRUE or FALSE: AICc and SSR agree in ranking different models if the models have the same number of parameters.") out_types = c("Rounded_Numeric","Logical") out_notes = c("Round to two significant digits","Report True/False") outcomes = data.frame( TaskID = rep(tid,nrec), TaskText = rep(tasktext,nrec), RecordID = paste0('T',tid,'R',(1:nrec)), Record = out_records, Type = out_types, Note = out_notes) alloutcomes = rbind(alloutcomes,outcomes) rc = rc + nrec #increment record counter by number of outcomes to record for this task ######################### # Task 2 ######################### tid = tid + 1 tasktext = "Based on what we just did and found, we would conclude that model 2 better matches the data, and thus B-cells are more likely to important in this system compared to T-cells. Of course the major caveat is: **Given all other assumptions**. In this case, the assumptions are both how we build the models, and which parameters we assumed to be fixed and the values we gave them. We can't easily explore different model implementations here, but we can explore how other choices for the fixed model parameters might impact our conclusions. Reset all inputs, then set dI=1, dV=8. We are basically now assuming that infected cells live twice as long (24h instead of 12h) and virus is cleared twice as fast. The choices for fixed parameters need to come from the literature, but there is usually some range of reasonable values. Run a single iteration for model 1, you should find AICc = 18.95. Now run it for 500 iterations. The AICc will have gone down (though is still positive). Then run model 2 for 500 iterations. You'll find that again, model 2 is better, though not surprisingly, the best fit estimates for the parameters are different than those found in the previous task. You can increase the number of iterations or the starting values for the parameters. I think (but have not exhaustively tested) that model 2 will continue to be able to produce a better fit compared to model 1. This suggests that model 2 might be robustly better than model 1. However, it could be that there are some settings for the fixed parameters for which model 1 performs better." nrec = 2 # number of items to record out_records = c("AICc for model 1, 500 interations (round to 2 digits).", "AICc for model 2, 500 interations (round to 2 digits).") out_types = rep("Rounded_Numeric",nrec) out_notes = rep("Round to two significant digits",nrec) outcomes = data.frame( TaskID = rep(tid,nrec), TaskText = rep(tasktext,nrec), RecordID = paste0('T',tid,'R',(1:nrec)), Record = out_records, Type = out_types, Note = out_notes) alloutcomes = rbind(alloutcomes,outcomes) rc = rc + nrec #increment record counter by number of outcomes to record for this task ######################### # Task 3 ######################### tid = tid + 1 tasktext = "Keep exploring. For instance change the bounds for the parameters. As you learned, this shouldn't affect the best fit (as long as the best fit estimate is not at one of the bounds), but you'll see that it might still do so. In practice, you would need to try multiple solvers with multiple starting values to determine which model is better. You could also explore if the model ranking remains if you change assumptions for your non-fitted parameters. You could for instance systematically vary them over ranges, or sample them - using approaches you saw in previous apps. The amount of exploration you do is some kind of cost-benefit trade-off. You could always do more, but at some point you have to decide it's enough to satisfy you (and the reviewers and your audience). The overall point to take away from this app is that while model comparison can be informative and useful, you always need to keep in mind that it relies on other assumptions you made about model type and structure, and choices of fixed parameter values (if you fixed any)." nrec = 1 # number of items to record out_records = c("Nothing") out_types = rep("None",nrec) out_notes = c("") outcomes = data.frame( TaskID = rep(tid,nrec), TaskText = rep(tasktext,nrec), RecordID = paste0('T',tid,'R',(1:nrec)), Record = out_records, Type = out_types, Note = out_notes) alloutcomes = rbind(alloutcomes,outcomes) rc = rc + nrec #increment record counter by number of outcomes to record for this task
#save the fully filled task table to a tsv file alloutcomes$QuizID = paste0(packagename,"_",appsettings$appid) alloutcomes$AppTitle = appsettings$apptitle alloutcomes$AppID = appsettings$appid #remove a few variables from the data frame savedoutcomes <- dplyr::select(alloutcomes,QuizID,AppID,AppTitle,TaskID,TaskText,RecordID,Record,Type,Note) write.table(savedoutcomes, paste0(appsettings$appid,"_tasktable.tsv"), append = FALSE, sep = "\t", row.names = F, col.names = TRUE)
# Take all the text stored in the table and print the tasks and items to record write_tasktext(alloutcomes)
r appsettings$simfunction
. You can call them directly, without going through the shiny app. Use the help()
command for more information on how to use the functions directly. If you go that route, you need to use the results returned from this function and produce useful output (such as a plot) yourself. vignette('DSAIRM')
into the R console.Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.