knitr::opts_chunk$set(message = FALSE) library(tidyverse) devtools::load_all() library(glue)
Let $S, I,$ and $R$ represent the number of susceptible, infected, and removed individuals in a population of $N$ individuals. In the basic SIR model, the force of infection (FOI), which is the rate of new infections (per susceptible, per unit time) is given by
$$ \lambda = \beta \frac{I}{N} $$ While this is often simply explained as the product of the transmission rate (per susceptible, per unit time) with the probability of encountering an infective in the population (under the assumption of homogeneous mixing), it will help us to build up an age-structured model if we break the FOI down a bit further. Namely, we want to specifically reference "contacts" between individuals in the population, because this will be the primary source of heterogeneity between age groups.
Defining a contact is a bit of a tricky endeavor, but I like to think about it as an interaction between two individuals where disease \emph{could} pass between individuals, but may not for a variety of reasons (e.g., the pair states of the of individuals isn't susceptible-infected, the duration of contact wasn't sufficient for the disease to pass, the viral load of the infected individual wasn't large enough to pass the disease under the circumstances, etc.).
The FOI can be thought of as the product of three different terms:
$$ \begin{align} \lambda = &\left ( \frac{\text{average number of contacts}}{\text{per susceptible, per unit time}} \right ) \ &\times \mathbb{P}\text{(transmission | contact with an infective)} \ &\times \mathbb{P}\text{(contact with infective | contact with any individual in the population)}, \end{align} $$ where $\mathbb{P}(A | B)$ is the conditional probability of event $A$ given event $B$ has occurred.
Going back to the form of $\lambda$ in the basic SIR model, $\mathbb{P}\text{(contact with infective | contact with any individual in the population)}$ is modeled by $I/N$, and so the product of the remaining two terms is equal to $\beta$.
Now imagine the population is subdivided into $n$ age groups, indexed by $i = 1, ..., n$, and the time scale is short enough relative to the age group size that the aging process can be ignored. We want to incorporate age-based heterogeneity in (1) the average number of contacts per susceptible, per unit time, and (2) the frequency of contacts between age groups.
The first point affects the first term of the FOI (encoded in $\beta$), while the second point affects the conditional probability of contact with an infective (given contact with any individual in the population). To disentangle the average number of contacts per susceptible from the probability of transmission given contact with an infective, let's factor $\beta = \sigma \tau c$, where $\sigma$ is susceptibility (unitless), $\tau$ is infectivity (unitless), and $c$ is the average number of contacts per individual per unit time.
For age group $i$, the force of infection is then given by
$$ \lambda_i = (\sigma_i \tau_j) c_i \sum_{j = 1}^{n}P_{ij}\frac{I_j}{N_j}, $$
where
For the moment, assume there isn't any heterogeneity in susceptibility and infectivity, so without loss of generality set $\sigma_i = \tau_j = 1$.
Since $P_{ij}$ is a probability distribution for fixed $i$, the term $\sum_{j = 1}^{n}P_{ij}\frac{I_j}{N_j}$ is essentially a weighted average of the probability of contacting an infective across age groups, i.e. it's the average probability of contacting an infective over the entire population. Thus, it's exactly analogous to $I/N$ in the basic (homogenous) model.
The age-structured model reduces to the homogeneous model if the following three assumptions are made:
The way the model is set up is so that $c_i$ specifies the average number of contacts per susceptible of age $i$ per day while $P_{ij}$ determines the proportion these contacts that occur with age group $j$. In parameterizing these quantities, we want to ensure that that contacts are "balanced", i.e., that a contact between $i$ and $j$ counts as a contact for $i$ and a contact for $j$. In other words, the total number of contacts between age groups $i$ and $j$ implied by the model parameters must be conserved, no matter which age group is in the susceptible role and which is in the infective role. In the notation of the model, the equation we want to satisfy is
$$ c_iN_iP_{ij} = c_jN_jP_{ji}. $$ for all age classes $i$ and $j$. We call this the "balance" condition.
Mistry et. al. provide synthetic contact matrices that are constructed as follows:
Let's check that we can get recover the symmetric setting-specific $\Gamma$ matrix with a specific example:
popbyage <- read_csv( system.file("params", "mistry-cmats", "ON_age_distribution_85.csv", package = "McMasterPandemic"), col_names = c("age_group", "mistry_value") ) household <- mk_mistry_pmat(weights = c(household = 1, school = 0, work = 0, community = 0)) gamma_household <- household*popbyage$mistry_value isSymmetric(gamma_household)
The force of infection used in the Mistry paper is
$$ \lambda_i = \beta \sum_j M_{ij} \frac{I_j}{N_j}, $$
where $\beta$, "the transmisibility of the infection", must be a unitless quantity since $M_{ij}$ already has units of contacts between ages $i$ and $j$ per susceptible of age $i$ per unit time.
The balance condition (in the notation of this model) is $N_i M_{ij} = N_j M_{ji}$. We can see this balance is satisfied for any weights $\omega_k$:
$$ N_i M_{ij} = N_i \sum_k \omega_k F_{ij}^k = N_i \sum_k \omega_k \frac{\Gamma_{ij}^k}{N_i} \ = \sum_k \omega_k \Gamma_{ij}^k = \sum_k \omega_k \Gamma_{ji}^k \frac{N_j}{N_j} = N_j \sum_k \omega_k \frac{\Gamma_{ji}^k}{N_j} = N_j M_{ji} $$ (by the symmetry in $\Gamma^k$). This equation implies that if we multiply the rows of $M$ with the relevant age-specific population, we should get a symmetric matrix, and we see that we do:
M <- mk_mistry_pmat() isSymmetric(M*popbyage$mistry_value)
The Mistry et. al. force of infection is
$$ \lambda_i = \beta \sum_j M_{ij} \frac{I_j}{N_j}, $$
while ours is
$$ \lambda_i = c_i \sum_{j = 1}^{n}P_{ij}\frac{I_j}{N_j}, $$
We can make the following change of variables to go from one to the other:
$$ c_i = \beta \bar{M_{i}} \ P_{ij} = M_{ij}/\bar{M_{i}} $$ where $\bar{M_{i}}$ is the $i$th row sum of $M$. In other words, we just need to row-normalize $M_{ij}$.
While $\beta$ can be anything, it's the same constant multiple across age groups, which means that the $c_i$ must be proportional to the vector of $\bar{M_i}$ in order for the balance condition to be satisfied:
$$ c_i N_i P_{ij} = \beta \bar{M_i} N_i M_{ij}/\bar{M_i} = \beta N_i M_{ij} \ = \beta N_j M_{ji} \quad \text{(by balance shown above)} \ = \beta \bar{M_j}/\bar{M_j} N_j M_{ji} = \beta \bar{M_j} N_j P_{ji} = c_j N_j P_{ji} $$ In order to use the Mistry matrices in our framework while still satisfying the balance criterion, we have three options for calibration:
Idea: can I take the already-calibrated $\beta(t)$ values (for the base model) and translate them into this framework to look at the fit between the age-stratified observations and this simple model?
How do the Mistry et al. population distributions compare with the Statcan population projections for 2020 that I've been using?
popproj <- ( read_csv("../../covid_age/data/statcan-pop/age_dist.csv", ) %>% filter(projection == "Projection scenario M1: medium-growth") %>% select(-projection) %>% mutate(age_group = case_when( age < 84 ~ age, T ~ 84 )) %>% group_by(age_group) %>% summarise(statcan_value = sum(value*1000), .groups = "drop") ) popbyage <- ( left_join(popbyage, popproj, by = "age_group") %>% pivot_longer(-age_group) ) (ggplot(popbyage, aes(x = age_group, y = value, colour = name, group = name)) + geom_point() + geom_line() ) (popbyage %>% pivot_wider() %>% mutate(diff = statcan_value-mistry_value) %>% mutate(abs_diff = abs(diff)) %>% mutate(rel_diff = abs_diff/statcan_value) ) -> popbyage glue("total difference: {popbyage %>% summarise(tot = sum(diff)) %>% pull(tot)}") glue("total (absolute) difference between mistry and statcan pops: {popbyage %>% summarise(tot = sum(abs_diff)) %>% pull(tot)}") (ggplot(popbyage, aes(x = age_group, y = rel_diff*100)) + geom_point() + geom_line() + labs(y = "% difference from statcan") )
OK, not terrible, not great. Since the synthetic network that generates the $\Gamma_{ij}^{k(s)}$s was built with this population distribution, it would probably be best to just use this one in the model so at least it's consistent? I'm a little worried about the large discrepancy in the 20-40 population, which makes up a bulk of the known infections.
The alternative would be to rescale the $F_{ij}^k$ matrices using the updated population distribution,
$$ \tilde{F_{ij}^k} = F_{ij}^k\frac{N_i}{\tilde{N_i}}, $$ where $\tilde{\cdot}$ denotes parameters based on updated population values.
I'm not sure if this is the smart thing to do... I suppose I could compare simulation results between the two cases? Maybe best to just stick to the Mistry population data for now.
household_ontario <- mk_mistry_pmat(weights = c(household = 1, school = 0, work = 0, community = 0)) household_pei <- mk_mistry_pmat(weights = c(household = 1, school = 0, work = 0, community = 0), province = "Prince_Edward_Island") pop_ontario <- read_csv( system.file("params", "mistry-cmats", "ON_age_distribution_85.csv", package = "McMasterPandemic"), col_names = c("age", "pop") ) pop_pei <- read_csv( system.file("params", "mistry-cmats", "PE_age_distribution_85.csv", package = "McMasterPandemic"), col_names = c("age", "pop") ) gamma_household_ontario <- household_ontario*pop_ontario$pop gamma_household_pei <- household_pei*pop_pei$pop
Ontario households (contact "abundance"):
Matrix::image(Matrix(gamma_household_ontario))
PEI households (contact "abundance"):
Matrix::image(Matrix(gamma_household_pei))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.