Model Comparison Fit

#load various variable definitions that are the same for each app
source('startup_script.R')
sapply(files_to_source, source) #source some helper files defined in the files_to_source variable
currentrmdfile = knitr::current_input()  #get current file name
appsettings = get_settings(currentrmdfile,appdocdir,packagename) #get settings for current app

Overview {#shinytab1}

This app shows some basic fitting of data to 2 simple infection models. This shows the concept of model/hypothesis testing. Read about the model in The Model tab. Then, work through the tasks described in the What To Do tab.

Learning Objectives

The Model {#shinytab2}

Model Overview

This app fits 2 different models to virus load data from humans. The idea is that by comparing both models to the data, we can determine which one fits the data better, and, therefore, tentatively suggest that the mechanisms which are part of the better-fitting model might be more realistic descriptions of the underlying biological system.

Model 1

This model consists of 4 compartments. The following entities are modeled:

The following processes/flows are included in the model:

  1. Virus infects cells at rate b.
  2. Infected cells produce new virus at rate p, are killed by T-cells at rate k and die due to other causes at rate d~I~.
  3. Free virus is removed at rate d~V~ or goes on to infect further uninfected cells. It can also infect new cells at rate b, with unit conversion factor g.
  4. T-cells are activated proportional to virus at rate a and undergo exponential growth at rate r.

Model 2

This model also consists of 4 compartments. The following entities are modeled:

The following processes/flows are included in the model:

  1. Virus infects cells at rate b.
  2. Infected cells produce new virus at rate p and die at rate d~I~.
  3. Free virus is removed by antibodies at rate k, and by other mechanisms at rate d~V~. It can also infect new cells at rate b, with unit conversion factor g.
  4. B-cells/antibodies grow exponentially proportional to virus at rate a and decay at rate d~X~.

Model Diagram

The diagram illustrates both compartmental models, with colors to indicate mechanisms that are part of either model 1 or model 2.

knitr::include_graphics(path = paste0("../media/",appsettings$modelfigname))

Model Equations

Implementing the models as continuous-time, deterministic systems leads to the following set of ordinary differential equations for model 1:

$$ \begin{aligned} \dot U & = - bUV \ \dot I & = bUV - d_I I - kIX \ \dot V & = pI - d_V V - gbUV \ \dot X & = a V + r X \end{aligned} $$

Model 2 is described by these equations: $$ \begin{aligned} \dot U & = - bUV \ \dot I & = bUV - d_I I \ \dot V & = pI - d_V V - k V X - gbUV \ \dot X & = a V X - d_X X \end{aligned} $$

Model Concepts

Data Source

The data being used in this app comes from [@hayden96]. Specifically, the data being fit is the 'no intervention' viral load data shown in Figure 2 of the paper.

Fitting Approach

The basic fitting approach is the one described in the Basic Virus Model Fitting app, i.e., we minimize the sum of squares of the log differences between model predictions and virus load data, taking censored values into account.

Model Comparison

There are different ways to evaluate how well a model fits to data, and to compare between different models. This app shows the approach of using Akaike's "An Information Criterion" (AIC), or more precisely, we'll use the one corrected for small sample size, AICc. If we fit by minimizing the sum of squares (SSR), as we do here, the formula for the AICc is:

$$ AICc = N \log(\frac{SSR}{N})+2(K+1)+\frac{2(K+1)(K+2)}{N-K} $$

where N is the number of data points, SSR is the sum of squares residual at the final fit, and K is the number of parameters being fit. A lower value for AICc means a better model.

This combination of SSR (or more generally, some measure of goodness of fit) with the number of parameters (or more generally, model complexity) tries to find a trade-off between how well a model fits and how big it is. Generally speaking, bigger models fit better. But, this can lead to overfitting. We want to find a good balance between model complexity and fit quality. AIC is one way to do that. For AIC, it means a bigger model with lower SSR might still have a higher AIC since model complexity gets penalized. (In our case, since both models have the same number of parameters, the two quantities move in the same direction.)

One nice feature of the AIC is that one can compare as many models as one wants without having issues with p-values or correcting for multiple comparison, and the models do not need to be nested (i.e., each smaller model being contained inside the larger models / a criterion for likelihood ratio tests). That said, AIC has its drawbacks. Nowadays, if one has enough data available, the best approach is to evaluate model performance by a method like cross-validation [@hastie11], and/or if available, building models with one dataset and testing them on a different dataset.

For evaluation of models with different AIC, there is fortunately no arbitrary, magic value (e.g., p=0.05). A rule of thumb is that if models differ by AIC of more than 2, the one with the smaller one is considered statistically better supported (don't use the word significant since that is usually associated with a p-value<0.05, which we don't have here). I tend to be more conservative and want AIC differences to be at least >10 before I'm willing to favor a given model. Also, I think that visual inspection of the fits is useful. If one model has a lower AIC, but the fit doesn't look that convincing biologically (e.g., very steep increases or decreases in some quantity), I'd be careful drawing very strong conclusions.

Notes

What to do {#shinytab3}

The tasks below are described in a way that assumes everything is in units of days (rate parameters, therefore, have units of inverse days). If any quantity is not given in those units, you need to convert it first (e.g. if it says a week, you need to convert it to 7 days).

#this is the running counter for the records which starts at 1 
rc=1

#empty object, will hold all outcomes
alloutcomes = NULL

#########################
# Task 1
#########################
tid = 1
tasktext = "Take a look at the inputs and outputs for the app. It is similar to the previous fitting apps. To keep things simple, the confidece interval computation part is not present. Each model has 3 parameters which are being estimated (_a_, _b_ and and _r_ for model 1 and _a_, _b_ and _dX_ for model 2). You might wonder why the other parameters are fixed, especially since in previous examples we fitted parameters such as _p_ and _dV_. The only reason here is to keep things simple, and given the small amount of data, it is not possible to estimate more parameters. For a real problem, you would need good outside scientific knowledge to justify fixing some parameters (e.g., those were measured fairly precisely in separate experiments). Otherwise, you would need to fit those parameters, or, if you don't have enough data (i.e., you might be overfitting), you'll have to simplify your model.

For this example, we'll just assume we know the other parameters. Run model 1 for 1 iteration with the default settings (you might want to plot the result with a log y-axis). You should get an SSR of 5.8 and an AICc of 10.7. Now switch to model 2, also run for 1 step. Both SSR and AICc are a bit lower. Those starting values don't really mean anything, it's just an initial check that model and data are reasonably close and we can start fitting. Now run model 1 for 500 iterations. AICc should have dropped to around -7.08. Do the same 500 iterations for model 2. You'll find it fits better, with a lower AICc (remember, the actual value of the AICc doesn't matter, all you can do is compare differences in AICc between models fit to the same data, and lower is better). If your computer allows, you can run more iterations for each model to check that these results hold.

In this example, the model with the lower AICc also have the lower SSR. Why is that so? Is that always true for any 2 models? (It's not: The 2 models here are in some sense special, understand how/why.) If you have a hard time figuring this out, I suggest you take the AICc equation and compute it manually for both models by inserting SSR and K (number of parameters) and data points (here N = 8). "

nrec = 2 # number of items to record
out_records = c("AICc for model 2, 500 interation (round to 2 digits).",
                "TRUE or FALSE: AICc and SSR agree in ranking different models if the models have the same number of parameters.")
out_types = c("Rounded_Numeric","Logical")
out_notes = c("Round to two significant digits","Report True/False")
outcomes = data.frame( TaskID = rep(tid,nrec),
                       TaskText = rep(tasktext,nrec),
                      RecordID = paste0('T',tid,'R',(1:nrec)),
                      Record = out_records, 
                      Type = out_types, 
                      Note = out_notes)
alloutcomes = rbind(alloutcomes,outcomes)
rc = rc + nrec #increment record counter by number of outcomes to record for this task 

#########################
# Task 2
#########################
tid = tid + 1
tasktext = "Based on what we just did and found, we would conclude that model 2 better matches the data, and thus B-cells are more likely to important in this system compared to T-cells. Of course the major caveat is: **Given all other assumptions**. In this case, the assumptions are both how we build the models, and which parameters we assumed to be fixed and the values we gave them. We can't easily explore different model implementations here, but we can explore how other choices for the fixed model parameters might impact our conclusions. Reset all inputs, then set dI=1, dV=8. We are basically now assuming that infected cells live twice as long (24h instead of 12h) and virus is cleared twice as fast. The choices for fixed parameters need to come from the literature, but there is usually some range of reasonable values. Run a single iteration for model 1, you should find AICc = 18.95. Now run it for 500 iterations. The AICc will have gone down (though is still positive). Then run model 2 for 500 iterations. You'll find that again, model 2 is better, though not surprisingly, the best fit estimates for the parameters are different than those found in the previous task. 

You can increase the number of iterations or the starting values for the parameters. I think (but have not exhaustively tested) that model 2 will continue to be able to produce a better fit compared to model 1. This suggests that model 2 might be robustly better than model 1. However, it could be that there are some settings for the fixed parameters for which model 1 performs better."

nrec = 2 # number of items to record
out_records = c("AICc for model 1, 500 interations (round to 2 digits).",
                "AICc for model 2, 500 interations (round to 2 digits).")
out_types = rep("Rounded_Numeric",nrec)
out_notes = rep("Round to two significant digits",nrec)
outcomes = data.frame( TaskID = rep(tid,nrec),
                       TaskText = rep(tasktext,nrec),
                      RecordID = paste0('T',tid,'R',(1:nrec)),
                      Record = out_records, 
                      Type = out_types, 
                      Note = out_notes)
alloutcomes = rbind(alloutcomes,outcomes)
rc = rc + nrec #increment record counter by number of outcomes to record for this task 


#########################
# Task 3
#########################
tid = tid + 1
tasktext = "Keep exploring. For instance change the bounds for the parameters. As you learned, this shouldn't affect the best fit (as long as the best fit estimate is not at one of the bounds), but you'll see that it might still do so. In practice, you would need to try multiple solvers with multiple starting values to determine which model is better. You could also explore if the model ranking remains if you change assumptions for your non-fitted parameters. You could for instance systematically vary them over ranges, or sample them - using approaches you saw in previous apps. The amount of exploration you do is some kind of cost-benefit trade-off. You could always do more, but at some point you have to decide it's enough to satisfy you (and the reviewers and your audience).

The overall point to take away from this app is that while model comparison can be informative and useful, you always need to keep in mind that it relies on other assumptions you made about model type and structure, and choices of fixed parameter values (if you fixed any)."

nrec = 1 # number of items to record
out_records = c("Nothing")
out_types = rep("None",nrec)
out_notes = c("")
outcomes = data.frame( TaskID = rep(tid,nrec),
                       TaskText = rep(tasktext,nrec),
                      RecordID = paste0('T',tid,'R',(1:nrec)),
                      Record = out_records, 
                      Type = out_types, 
                      Note = out_notes)
alloutcomes = rbind(alloutcomes,outcomes)
rc = rc + nrec #increment record counter by number of outcomes to record for this task 
#save the fully filled task table to a tsv file
alloutcomes$QuizID = paste0(packagename,"_",appsettings$appid)
alloutcomes$AppTitle = appsettings$apptitle
alloutcomes$AppID = appsettings$appid
#remove a few variables from the data frame
savedoutcomes <- dplyr::select(alloutcomes,QuizID,AppID,AppTitle,TaskID,TaskText,RecordID,Record,Type,Note)     
write.table(savedoutcomes, paste0(appsettings$appid,"_tasktable.tsv"), append = FALSE, sep = "\t", row.names = F, col.names = TRUE)
# Take all the text stored in the table and print the tasks and items to record
write_tasktext(alloutcomes)

Further Information {#shinytab4}

References



Try the DSAIRM package in your browser

Any scripts or data that you put into this service are public.

DSAIRM documentation built on Aug. 24, 2023, 1:07 a.m.