knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(EpiModel)
This vignette discusses some recent updates in EpiModel
on working with attributes and summary statistics within the stochastic network model class. This material is oriented towards custom models with extension modules and functions. Further details are provided in the "New Network Models with EpiModel" section of the EpiModel tutorials.
In network simulations with netsim
, we store all data in the Main List Object (referred to as dat
). In this vignette we will discuss with three types of data on dat
:
Attributes are characteristics of the nodes (e.g., persons) in the model at the current time-step. By default, every node has a unique_id
and an active
attribute used to track each node, as well as three attributes used in the epidemic modules: status
for disease status, entrTime
for the time the node entered the population, and exitTime
for the time the node left the population. Other attributes can be added of any name and value, like age
, but must be stored as a scalar values.
We work with attribute vectors that are all of the same size of the number of nodes in the model. In the attribute vectors, a node is identified by its Positional ID or posit_id
(i.e., the position in vector). The default attribute unique_id
created for each node gives a unique identification number allowing us to refer to nodes no longer in the model. Deaths or other forms of exit from the network disrupt the positional ID but do not impact the unique ID.
The EpiModel::get_attr
function will extract the vector of a given attribute. In its simplest form, we can pull a complete attribute vector from dat
like so:
active <- EpiModel::get_attr(dat, "active")
The above call will pull from dat
the active
attribute for all nodes. active
is a vector of size current number of nodes. With values being either 1 or 0 depending on whether a node is active or not.
Trying to extract an attribute that does not exist will cause EpiModel::get_attr
to throw an error.
In custom extension modules, we usually want to modify some of the attributes. Below is a minimal module that increments the age of all the nodes by 1.
aging_module <- function(dat, at) { # Extract current attribute age <- EpiModel::get_attr(dat, "age") # Aging process new_age <- age + 1 # Output updated attributes dat <- EpiModel::set_attr(dat, "age", new_age) return(dat) }
Let's break down this very simple yet perfectly valid module.
age
attribute vector as we did in the previous section.new_age
incrementing all ages by one.dat
object with the EpiModel::set_attr
function.dat
object.We can see that EpiModel::set_attr
takes as arguments:
dat
object to update.new_age
).When using EpiModel::set_attr
, there are several things to note:
dat
object, it merely returns a modified version of it to be assigned back to dat
. (Nicely, this does not cause performance issues due to the way R handles shallow copies since version 3.1)new_age
size is not equal to the number of nodes in the network, the function will throw an error.The above example describes the recommended way to work with attribute:
dat
object with the revised local vectors.dat
object at the end of the function.These functions have other arguments that are described in the documentation: see help("net-accessor", package = "EpiModel")
for further details.
The Attributes described above refer to the state of each node in the network at the current time-step. However, it is sometimes useful to track of what happened to nodes over time. Because saving the full history of the attributes would consume too much memory and is rarely necessary for full-scale research models, EpiModel offers a way to record specific attribute for specific nodes at different time-steps. These may be useful for diagnostic purposes (e.g., to see that a dynamic process is coded correctly) or for tracking a limited set of individual-level outcomes for further analysis.
The Attribute History is an efficient collection of recorded attributes at different time-steps. EpiModel has functions to record these elements and to access them in a convenient manner once the netsim
simulation is complete.
The following is a module that records the viral load of infected nodes every 10 time steps. We assume that this module is part of a model that defines and updates two attributes over time:
status
with possible values being "infected" or "susceptible".viral_load
as a continuous number for "infected" nodes and NA
otherwise.viral_load_logger_module <- function(dat, at) { # Run every 10 time steps if (at %% 10 == 0) { # Attributes status <- EpiModel::get_attr(dat, "status") viral_load <- EpiModel::get_attr(dat, "viral_load") infected <- which(status == "infected") dat <- EpiModel::record_attr_history( dat, at, "viral_load", infected, viral_load[infected] ) } # Output return(dat) }
The steps in the code are as follows:
infected
the posit_id
s of the infected nodes.viral_load
of infected
nodes at time at
under the label "viral_load".dat
object.EpiModel::record_attr_history
takes five arguments:
dat
object.at
, the current time-step).posit_id
s referring to the nodes of interest (here infected
).viral_load[infected]
)Note that EpiModel::record_attr_history
requires a set of posit_id
s. Internally, the function will convert them to unique_id
s so the Attribute History will not be affected by nodes entering or leaving the population over time.
When recording some attribute histories, one must ensure that we record as many values as there are posit_id
s. Otherwise, the function will throw an error. It however possible to use only one value for that attribute even if we record a value for multiple nodes. This last situation actually uses less RAM. In any case, we would want to record attribute histories sparingly as the storage can be large, especially for open population models with many nodes that run over many time steps.
The Attribute History is meant to be accessed once the netsim
simulation is complete. At that point, we can use the EpiModel::get_attr_history
function to access the histories that we have recorded, like so:
sim <- netsim(est, param, init, control) attr_history <- EpiModel::get_attr_history(sim)
The attr_history
object is a list of data.frame
s. One for each attribute history that was recorded (we use this flexible list structure because each history may be of different dimension). Assume that we were running two simulations, using the module defined above and another one (not shown) recording when a node switched from infected to recovered.
get_attr_history(sim) # $viral_load # sim step attribute uids values # 1 1 10 viral_load 1001 2000 # 2 1 10 viral_load 1002 1878 # 3 1 20 viral_load 1001 1500 # 4 1 20 viral_load 1002 300 # 5 2 10 viral_load 401 2500 # 6 2 10 viral_load 402 1378 # 7 2 20 viral_load 401 1200 # 8 2 20 viral_load 402 100 # ... # # $status # sim step attribute uids values # 1 1 22 status 1001 infected # 2 1 64 status 1002 infected # 3 1 110 status 1001 recovered # 4 1 220 status 1002 recovered # 5 2 7 status 401 infected # 6 2 15 status 402 infected # 7 2 20 status 401 recovered # 8 2 120 status 402 recovered # ...
We would get a named list of two data.frame
's:
value
column being the viral loads.value
column being whether the node became "infected" or "recovered" at the given time-step.The next type of data stored in dat
is called an Epidemic Tracker. This refers to some summary information about the population at a specific time step. This information is created and stored for each time step; therefore epidemic trackers are historical summary statistics.
One example of an Epidemic Trackers is num
, present in any model, which stores the size of the population at each time step.
Inside a module, Epidemic Trackers are accessed and modified with the functions EpiModel::get_epi
and EpiModel::set_epi
. Below is an updated new version of our aging_module
above with the addition of epidemic trackers.
aging_track_module <- function(dat, at) { # Attributes age <- EpiModel::get_attr(dat, "age") # Aging process new_age <- age + 1 # Calculate summary statistics mean_age <- mean(new_age) prev_mean_age <- EpiModel::get_epi(dat, at - 1, "mean_age") age_change <- mean_age - prev_mean_age # Update nodal attributes dat <- EpiModel::set_attr(dat, "age", new_age) # Update epidemic trackers dat <- set_epi(dat, "mean_age", at, mean_age) dat <- set_epi(dat, "age_change", at, age_change) return(dat) }
In this new module, in addition to incrementing the age of each node by one, we also record two values as Epidemic Trackers: the mean age of the population and the change in mean age compared to the previous step (we could have calculated the latter after the simulation was complete with mutate_epi
, but here we do it on the fly to demonstrate get_epi
).
We extract the mean age at the previous time step using EpiModel::get_epi
and set the second argument as at - 1
. After all the calculations are complete, we store mean_age
and age_change
in dat
using EpiModel::set_epi
.
EpiModel::set_attr
, dat
is not modified directly and need to be assigned back to itself. Also, the value we store must be a scalar.Epidemic Trackers are usually the main quantities of interest after a simulation has completed. We access them simply by calling as.data.frame
on a netsim
object or by using the plot or summary functions within by EpiModel. Note also that you can perform derived summary statistic calculations after a netsim
call is complete with EpiModel::mutate_epi
.
It can be useful to create small Epidemic Trackers outside of the modules and use them only when we need them. EpiModel will automatically run the custom trackers passed to the .tracker.list
argument of control.net
.
We call a tracker function a function
that takes dat
as arguments and outputs a scalar value. Every tracker function is run by EpiModel::netsim
at each time step.
epi_s_num <- function(dat) { needed_attributes <- c("status") output <- with(get_attr_list(dat, needed_attributes), { sum(status == "s", na.rm = TRUE) }) return(output) }
The epi_s_num
function defined above is a tracker function. It calculates at each time step the number of susceptible nodes in the network.
The next example is a tracker function that calculates the proportion of the population infected at each time step. Let's look what each element does:
epi_prop_infected <- function(dat) { # we need two attributes for our calculation: `status` and `active` needed_attributes <- c("status", "active") # we use `with` to simplify code output <- with(EpiModel::get_attr_list(dat, needed_attributes), { pop <- active == 1 # we only look at active nodes cond <- status == "i" # which are infected # how many are `infected` among the `active` sum(cond & pop, na.rm = TRUE) / sum(pop, na.rm = TRUE) }) return(output) }
We recommend that you write your tracker functions using this format:
needed_attributes
variable containing a vector of attribute names: in this example: "status" and "active".with
and EpiModel::get_attr_list(dat, needed_attributes)
to work in an environment with only the objects you need. This helps clarify what the tracker does and simplify any debugging. We save the result of this call into output
.with
expression is what will be stored in output
. This must be a scalar.return(output)
, what was calculated inside the with
construct.This functionality is enabled by passing the tracker functions as a named list in control.net
: .tracker.list
. Let's make a simple SI epidemic model with some added trackers:
# Create the `tracker.list` list some.trackers <- list( prop_infected = epi_prop_infected, s_num = epi_s_num ) control <- EpiModel::control.net( type = "SI", nsims = 1, nsteps = 50, verbose = FALSE, .tracker.list = some.trackers ) param <- EpiModel::param.net( inf.prob = 0.3, act.rate = 0.1 ) nw <- network_initialize(n = 50) nw <- set_vertex_attribute(nw, "race", rbinom(50, 1, 0.5)) est <- EpiModel::netest( nw, formation = ~edges, target.stats = 25, coef.diss = dissolution_coefs(~offset(edges), 10, 0), verbose = FALSE ) init <- EpiModel::init.net(i.num = 10) sim <- EpiModel::netsim(est, param, init, control) d <- as.data.frame(sim) knitr::kable(tail(d, n = 15))
Each function must be named in the .tracker.list
list. The name given there will be used to identify the tracker function in the epi
list and will be the name of the corresponding column of the data.frame
produced by as.data.frame(sim)
where sim
is a netsim
object.
Note: in the some.trackers
list, we put epi_prop_infected
and epi_s_num
without parentheses at the end. This is because we store the function
and not the result of calling the function.
Important: The trackers are Always run at the end of a simulation step. The value outputted reflect the state of the simulation after all the modules performed their tasks.
When working with complex research-level models, we sometimes want to inspect the state of an object without stopping the simulation. The function EpiModel::record_raw_object
allows the user to save any object during the simulation.
introspect_module <- function(dat, at) { # Attributes age <- get_attr(dat, "age") if (mean(age, na.rm = TRUE) > 50) { obj <- data.frame( age = age, status = EpiModel::get_attr(dat, "status") ) dat <- EpiModel::record_raw_object(dat, at, "old pop", obj) } return(dat) }
In this module, we look at the age of the population and if the mean age is more than 50, we create a data.frame
called obj
containing the age
and status
attribute vectors and store it in a Raw Record.
This Raw Record can be accessed in the final netsim
object for debugging purposes.
sim <- netsim(est, param, init, control) sim[["raw.records"]][[1]]
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.