knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(biorisk) library(tidyverse)
The idea of this document is to give a short introduction to biorisk. This package is to facilitate the implementation, simulation and analysis of QMRA models based on Monte Carlo simulations. It is currently in pre-alpha. This means that the overall structure and philosophy is the package is well-defined, but we still need to refine some features and add additional ones. This document provides a general overview of how models are built, simulated and analysed/visualized.
In biorisk, everything is a element. You want to define a normal distribution? You use a element. You want a linear inactivation model? You use a element. You want a constant value? You use a element. You want a growth element with correlation between storage time and temperature? You guessed it right; you need a element! This has two disadvantages. The first one is that, if you just want to fix some value to 2
in your model, you do not write
x <- 2
You have to write
x <- Constant$new("x", 2)
which is a bit more convoluted. But the advantage of this element-based philosophy is that everything has the same complexity. If you want to define a log-linear inactivation model, you write
inactivation <- LogLinInactivation$new("inactivation")
Or a secondary growth model
mu <- Ratkowsky_model$new("mu")
A dose-response model
dr <- DoseResponse_BetaPoisson$new("dose-response")
Or a distribution
x <- GammaPoisson$new("something")
So, regardless of what you define (distributions, microbial processes, constants, dose-response models...), you will be using elements. This means that learning how to work with biorisk means learning how to work with elements. You may have caught the hint that biorisk uses Object Oriented programming. Namely, it uses R6 classes. You can still use the packaging without understanding much of OO programming, as biorisk does all the heavy lifting on the background. Nonetheless, a quick look at the documentation of R6 (https://r6.r-lib.org) will help you understand some of the "odd" things the package does (such as the $new()
).
Elements in biorisk have two parts. The first one comprises the mathematical calculations (e.g. solving an equation or getting a random value according to some distribution). This is done inside the element and you don't have to worry about it. Your only worry is choosing the right element. This is done by writing the name of the element followed "$new" (the name of every element is given at the end of the document). Then, adding within quotes ("
) the name assigned to the element (this is just used for output results, so it is not relevant for the calculations). Moreover, due to the way R works, it is often a good idea to assign elements to a variable. We do that writing the name of the variable followed by <-
.
This may sound complicated, but it is quite simple. So, we are going to create a element describing a normal distribution. We are going to call this element "logN0" and we are going to assign it to a variable named logN0
(names do not have to match, but at this point it may be easier to understand if they do).
logN0 <- Normal$new("logN0")
And that's a normal distribution!
The second part of the element are its input/outputs. Every calculations in QMRA requires a number of model parameters (e.g. the D-value for microbial inactivation) and variables describing environmental/logistic factors (e.g. the temperature). In a similar way, probability distributions also need some parameters (e.g. the mean and variance of a normal distribution). All of these are called inputs in biorisk.
Depending on the type of calculation, each element has different inputs. To know their names, you just have to type $inputs
after the name of a variable storing a element. So, because our normal distribution is called logN0
...
logN0$inputs
This means that the element describing normal distribution needs a mean value (mu
) and a standard deviation (sigma
). One key aspect of biorisk is that elements are assigned to inputs of other elements. This gives a lot of flexibility because the input elements can be practically anything: a probability distribution, a growth model, a dose-response model... or a constant. Indeed, regardless of the complexity of the QMRA model, we will always have a constant on top of the chain of dependencies (e.g., constant parameters for a distribution).
In this first example, we are going two define two constants for our normal distribution: one for the mean and one for the standard deviation. They are defined in a very similar way as the normal distribution (they are also a element!). We write the name of the element (Constant
) plus $new
and, finally, a name within brackets. The only difference is that, for constants, we need to assign a value by putting a coma after the name.
So, we are going to define a constant value of 0
for the expected value
mu_logN0 <- Constant$new("mu_logN0", 0)
And a constant value of 0.5
for its standard deviation
sigma_logN0 <- Constant$new("sigma_logN0", .5)
So, right now we have defined a element with a normal distribution, a element for its (constant) mean and a third element for its (constant) standard deviation. However, right now, they are totally independent. So, in order to do the calculations, we need to assign the constant elements to the inputs of the normal distribution. We do that using the $map_input
method. I think it is easier to show it directly with an example...
This bit of code:
logN0$map_input("mu", mu_logN0)
maps the input named "mu"
of element logN0
(our normal distribution) to another element saved as mu_logN0
(the constant we defined before with value 0
).
Regardless of the type of elements or inputs, we will always use the $map_input
method for mappings. Simple enough, right? So, to map the input named "sigma"
to our constant 0.5
we do
logN0$map_input("sigma", sigma_logN0)
Because models are often hard to see just with code, biorisk includes several functions to make life easier. For instance, plot_model()
plots the structure of the elements. So, if we pass our logN0
to this function
plot_model(logN0)
we can see how the elements are mapped. In this case, our element called "logN0"
depends on elements "mu_logN0"
and "sigma_logN0"
. In this plot, the shapes and colours show the type of element. A grey triangle is a constant, whereas a blue circle is a distribution. There are other shapes included in the package (e.g. for inactivation, growth, dose-response...) -- in the near future, I will also add the name of the input--.
The biorisk package is implemented in a way that allows several shortcuts. For instance, it allows method chaining, a coding style where you can chain operations. This means you can include the mapping right after the definition of the element. In other words, this bit of code:
mu_logN0 <- Constant$new("mu_logN0", 0) sigma_logN0 <- Constant$new("sigma_logN0", .5) logN0 <- Normal$new("logN0")$ map_input("mu", mu_logN0)$ map_input("sigma", sigma_logN0)
Is equivalent to what we wrote before
mu_logN0 <- Constant$new("mu_logN0", 0) sigma_logN0 <- Constant$new("sigma_logN0", .5) logN0 <- Normal$new("logN0") logN0$map_input("mu", mu_logN0) logN0$map_input("sigma", sigma_logN0)
The second shortcut is that you do not need to assign every element to a variable. Instead, they can be defined directly within $map_input()
This is quite useful for constant because they are quite uninteresting (they do not have dependencies, they do not have distributions...). So, this bit of code is also equivalent to the previous ones
logN0 <- Normal$new("logN0")$ map_input("mu", Constant$new("mu_logN0", 0) )$ map_input("sigma", Constant$new("sigma_logN0", .5) )
Nonetheless, if you are adamantly against method chaining, you can use the package without it (although the code will be longer and harder to read).
Normal distributions are nice but boring. So, let's illustrate the package with something a bit more exciting: a (very) simple QMRA model. First, we are going to use a log-linear inactivation model. How do we define it? Just as before, using the name defined within the package (LogLinInactivation
) followed $new
and a name we assign within quotations
inact <- LogLinInactivation$new("Inactivation")
And how do we check its inputs? By writing $inputs
after the variable name (note that the variable name inact
is now different than the element name "Inactivation"
)
inact$inputs
So, we need 3 inputs: the treatment time (t
), the D-value (D
) and the initial count (logN0
).
To logN0
, we will assign the normal distribution we defined above.
inact$map_input("logN0", logN0)
To the treatment time, we will assign a constant value of 30
.
inact$map_input("t", Constant$new("treat_time", 30) )
And to the D-value we are going to assign a log-normal distribution. The good news here is that biorisk includes a LogNormal
element:
D <- LogNormal$new("D")$ map_input("mu_log10", Constant$new("mu_logD", 1))$ map_input("sigma_log10", Constant$new("sigma_logD", 0.2))
And now we assign this distribution to the inact
element.
inact$map_input("D", D)
That was a lot to take in one go. That's why, it is often a good idea to call plot_model()
to see what we just did:
plot_model(inact)
Note that plot_model()
uses the name of the element, not the name of the variable where the element is saved. For instance, the inactivation element is saved in a variable called inact
, but its name is "Inactivation" (inact <- LogLinInactivation$new("Inactivation")
). Remember that the name within quotation is the one biporisk uses for every output. It is often a good idea that every element within a model have unique names (e.g., not having 2 elements named "time") because some plotting functions will not work correctly.
Once we have built the model, doing MC simulations is really simple. We just need to call the $simulate()
method with the number of Monte Carlo simulations. Just like this
set.seed(1241) inact$simulate(1000)
and biorisk takes care of doing 1,000 simulations of the element and all its dependencies. By default, the method does not return anything. Instead, the results of each MC iteration are stored within each element
inact$simulations %>% head()
Note that biorisk takes care of doing the calculations for all the dependencies of the element. For instance, it updates the simulations for the distribution of the D-value:
D$simulations %>% head()
Also note that biorisk only does the calculation for the upstream dependencies. This makes sense from a computational point of view, because it allows analyzing parts for the model separately. For instance, one can simulate the distribution of the D-value without having to analyze the complete model.
Looking at the results of individual MC simulations is not very useful For that reason, biorisk includes several tools to visualize and summarize the results.
Every element has a $histogram()
, $boxplot()
and $density_plot()
element that allows checking its output
inact$histogram()
D$histogram()
logN0$density_plot()
However, in most situations, we are not just intersted in the output of a single element. For that reason, plot_outputs()
compares the distribution of the output variable of every element in the QMRA model
plot_outputs(inact)
Nonetheless, most of these outputs have different units. For that reason, the function includes the chosen
argument that allows choosing what to plot.
plot_outputs(inact, chosen = c("logN0", "Inactivation"))
Besides boxplots, the type
argument allows doing violin plots
plot_outputs(inact, chosen = c("logN0", "Inactivation"), type = "violin")
or density plots
plot_outputs(inact, chosen = c("logN0", "Inactivation"), type = "density")
Although plots can help us understand the output of a QMRA model, we often need also numeric values. For that reason, we can use quantile_table()
to create a table with the quantiles of the outputs of every element
quantile_table(inact)
Again, you can specify what outputs to show in the table
quantile_table(inact, chosen = c("logN0", "Inactivation") )
By default, the function plots the maximum and minimum value, the median and the 25th and 75th percentile. You can modify this using the probs
argument. For instance, to look at the median, 90th and 99th percentile, we pass probs = c(.5, .9, .99)
:
quantile_table(inact, chosen = c("logN0", "Inactivation"), probs = c(.5, .9, .99))
The package also includes tornado plots for sensitivity analysis with the tornado_plot()
function.
tornado_plot(inact)
Again, it allows choosing what element is included in the calculation (not too useful in this example because there are not enough levels)
tornado_plot(inact, chosen = "logN0")
In biorisk, we have also included functions for doing 2D Monte Carlo analysis. In order to define a 2D model, the only difference is that we need to pass the level
argument to the elements describing distributions. As a very simple example, this is a model with 2 normal distributions, one on the variability level (level 0) and another one in the uncertainty level (level 1):
bb <- Normal$new("mu_mu", level = 1)$ map_input("mu", Constant$new("mu_mu_mu", 3))$ map_input("sigma", Constant$new("mu_sigma_mu", .1)) aa <- Normal$new("aa", level = 0)$ map_input("mu", bb )$ map_input("sigma", Constant$new("sigma", .2)) plot_model(aa)
Now, to do 2-d Monte Carlo, instead of using the $simulate()
method, we use the simulate_2D()
method. The only difference is that, instead of passing a single number, we need to pass 2: the number of simulations in the variability level (50 here) and the number of iterations in the uncertainty level (10 here)
aa$simulate_2D(50, 10)
Again, the results of the simulation are stored within the elements themselves. They are stored a list where each element is a simulation in the uncertainty level. Then, each element has a table where rows are the iterations in the variability level.
To illustrate this, let's look at the results of our simulations. The distribution bb
is defined in the uncertainty level. This means that within each variability simulation (each element in the list), it remains constant (i.e., every row is the same). However, it still varies between elements of the list.
bb$simulations_multi[1:3] %>% map(head)
On the other hand, aa
is defined in the variability level, so it varies both between elements of the list and between rows. Note here that the mu
input of aa
matches the output of bb
.
aa$simulations_multi[1:3] %>% map(head)
The biorisk package also includes functions specific for the visualization of the output of 2D Monte Carlo models. They can be visualized as a density plot, where the steel blue area is the variation in the variability level, and the grey one the total variation (variability + uncertainty):
aa$density_plot_2D()
It can also be visualized as a cumulative density plot, where the black solid line is the variation in the variability level and the shaded are the additional one due to uncertainty:
aa$cummulative_plot_2D()
Finally, the results can also be shows as a table of quantiles:
quantile_table_2D(aa, probs = c(.1, .9), chosen = c("aa", "mu_mu"))
In this case, for each quantile selected (in probs
), the function returns two values: the quantile on the uncertainty level (level 0) and the one on the variability level (level 1). Note that, because element bb
("mu_mu"
) is defined on the uncertainty level, the quantiles for both levels are the same.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.