knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(tidyverse) library(biorisk)
Despite being the go-to method for including variability and uncertainty in QMRA, 2D-Monte Carlo simulations can be hard to define and interpret. This document explains the approach that biorisk uses for this methodology. In order to avoid surprises, we also set up the seed of the p-rng.
set.seed(412)
As a first step, let's define a normal distribution. For simplicity, we will consider that this defines some type of variability. So, we will call it just "variability":
normal_variability <- Normal$new("variability")
This distribution has two parameters (mu
and sigma
). For starters, we will
assign constant to both of them.
normal_variability$ map_input("sigma", Constant$new("sigma", .1))$ map_input("mu", Constant$new("mu", 0))
Once we have defined the model, we can get 10,000 Monte Carlo simulations
normal_variability$simulate(10000)
and visualize them as a density plot
normal_variability$density_plot()
We will now define our uncertainty distribution by assuming that the expected value
of variability
follows a normal distribution. First, we just define the uniform
distribution, which we will call "uncertainty".
uniform_uncertainty <- Uniform$new("uncertainty")
This distribution has two parameters. We assign constants to both of them:
uniform_uncertainty$ map_input("min", Constant$new("min", -.5))$ map_input("max", Constant$new("max", .5))
We can simulate this distribution and visualize it as a density plot.
uniform_uncertainty$simulate(1000) uniform_uncertainty$density_plot()
Right now, both our distributions are independent. That is not what we want. We want
"uncertainty" to describe the uncertainty of the expected value of "variability" (mu
).
We can do that by "mapping the input" mu
to our uniform_uncertainty
normal_variability$ map_input("mu", uniform_uncertainty)
so we now have a linked model
plot_model(normal_variability)
We can now simulate the model as we did before to get 10,000 Monte Carlo simulations that combine the variability (normal distribution) with the uncertainty (uniform distribution for the expected value).
normal_variability$simulate(10000)
If we do a density plot
normal_variability$density_plot()
we observe that both distributions have been combined. Instead of the typical bell curve, we see that there is a "flat top". This is due to the uncertainty in the expected value.
Therefore, these simulations combine the variability and the uncertainty. However, we cannot separate between them.
This separation is accomplished using 2D Monte Carlo. In biorisk, this is done
by adding a level
to the definition of the probability distribution. Distributions
on the "uncertainty" level are level = 1
and those in the variability level are
level = 0
.
It is more convenient to define the uncertainty distributions first. Therefore,
we define our uniform_uncertainty
. Please note the addition of level = 1
:
uniform_uncertainty <- Uniform$new("uncertainty", level = 1)
Apart from that, the mapping of the inputs is identical as for the 1D Monte Carlo.
uniform_uncertainty$ map_input("min", Constant$new("min", -.5))$ map_input("max", Constant$new("max", .5))
Next, we define our variability distributions. Please note the addition of level = 0
:
normal_variability <- Normal$new("variability", level = 0)
Again, the mapping of the inputs is identical regardless of whether calculations are 1D or 2D:
normal_variability$ map_input("sigma", Constant$new("sigma", .1))$ map_input("mu", uniform_uncertainty)
We can plot the model to see that we have built the same dependencies:
plot_model(normal_variability)
In fact, even if we have defined variability and uncertainty separately, we can still make the simulations as 1D Monte Carlo (small differences in the contour are due to lack of convergence):
normal_variability$simulate(10000) normal_variability$density_plot()
But, again, those simulations do not separate between variability and uncertainty.
To make that separation, we need to use $simulate_2D()
. This method takes two
arguments. The first one is the number of MC iterations in level 0 and the second
the number of iterations in level 1. This difference will be clear in a second, so
we first call the function with 500 and 200 iterations.
normal_variability$simulate_2D(500, 200)
So... what did just happen? When we use $simulate()
, biorisk creates a large table where each row is a MC iteration:
normal_variability$simulations %>% head()
When we use $simulate_2D()
, biorisk creates instead an array. That is just a fancy word for a table with extra dimensions. The results are saved in $simulations_multi
and we can see that each one of its entries looks like a normal Monte Carlo simulation. For instance, this is the first one:
normal_variability$simulations_multi[[1]] %>% head()
and this is the second one:
normal_variability$simulations_multi[[2]] %>% head()
If we play a closer look, we see that each one of those tables has 500 rows
normal_variability$simulations_multi[[2]] %>% nrow()
and if we count them, we see there are 200
normal_variability$simulations_multi %>% length()
Those numbers sound familiar, right? That's because they are the number of iterations
that we defined when we called $simulate_2D(500, 200)
.
So, what is happening behind the curtains?
For each simulation in the uncertainty level, we can simulate values from each distribution defined at (level = 1
). Then, using those values, we do a whole Monte Carlo simulations for those distributions at (level = 0
), considering the dependencies. This is clear if we look at the first uncertainty iteration for uniform_uncertainty
:
uniform_uncertainty$simulations_multi[[1]] %>% head()
Even though it is a distribution, every row has exactly the same value (0.43).
That is due to the rows within the table representing the variability iterations.
Because uniform_uncertainty
is defined on the uncertainty level (level = 1
),
it does not vary in that dimension.
Now, if we look at the second uncertainty
uniform_uncertainty$simulations_multi[[2]] %>% head()
The value also remains constant between rows, but it is a different one (-0.43). So,
uniform_uncertainty
varies between tables (uncertainty dimension), but not between
rows (variability dimension).
On the other hand, normal_variability
varies in both dimensions. If we look at the
first uncertainty iteration
normal_variability$simulations_multi[[1]] %>% head()
we see that x
(the output of the normal distribution) varies between rows, as is
normal in every Monte Carlo simulation. Please note that both mu
and sigma
remain
constant between rows (variability dimension). Why is that the case if mu
is defined
as a distribution? Because that distribution is in the uncertainty dimension (level = 1
).
In fact, the value remains constant at 0.43, which was the output of uniform_uncertainty$simulations_multi[[1]]
.
In fact, if we look at the second uncertainty iteration
normal_variability$simulations_multi[[2]] %>% head()
it matches uniform_uncertainty$simulations_multi[[2]]
.
The results of the 2D Monte Carlo simulation can be visualized using 2D cummulative plot
normal_variability$cummulative_plot_2D()
In this plot, the solid line is the "variability distribution". Then, the gray ribbon introduces the additional uncertainty due to the "uncertainty distributions". This shows somehow the main hypothesis of 2D Monte Carlo. What we observe is a combination of variability and uncertainty. If we remove uncertainty, we have only variability, so we have "narrower" distributions. Theoretically, we could follow the opposite approach and remove variability to observe only uncertainty. However, for QMRA, we care more about variability than uncertainty, so we define models in this way because it is more convenient.
This philosophy is also evident if we look at the quantile table.
quantile_table_2D(normal_variability, probs = c(.5, .9), chosen = c("variability", "uncertainty"))
The uncertainty
element does not make a difference between variability and uncertainty dimensions.
Therefore, its quantiles do not vary between both levels.
However, the variability
element makes a difference. For that reason, the quantiles
at level = 0
are different from those at level = 1
. In particular, the quantiles at
level = 1
are calculated as the quantiles of every simulation together. That is, binding every
Monte Carlo simulation in the uncertainty dimension and calculating quantiles. On the
other hand, the quantiles at level = 0
are calculated independently for each iteration in the
uncertainty dimension. Then, the values reported are the median of each quantile. Therefore,
as a general rule, quantiles at level = 1
should be wider than those at level = 0
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.