knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(conmat) library(socialmixr) library(ggplot2) library(dplyr) library(tidyr) library(mgcv) library(patchwork)
The goal of conmat is to simplify the process of generating synthetic contact matrices for a given age population.
What is a contact matrix?
Contact matrices describe the degree of contact between individuals of given age groups. For example, this matrix describes the number of contacts between individuals.
#| label: show-contact #| echo: FALSE name_vec <- c("0-4", "5-9", "10-14") cmat <- matrix( data = rep(NA, 9), nrow = 3, ncol = 3, byrow = TRUE, dimnames = list( name_vec, name_vec ) ) diag(cmat) <- c(10, 11, 13) cmat[upper.tri(cmat)] <- 3:5 cmat[lower.tri(cmat)] <- 3:5 cmat
The rows and columns represent the age groups of the people. On the main diagonal we see that we have a higher number of contacts - showing that people of similar ages tend to interact more with one another. We can use the information in these matrices to model how diseases such as COVID-19 spread in a population through social contact.
Why do we need synthetic contact matrices?
Contact matrices are produced from empirical data resulting from a contact survey, which requires individuals to diary the amount and manner of contact a person has in a day. However, these surveys are highly time-consuming and expensive to run, meaning that only a handful of these empirical datasets exist globally.
We can use statistical methods to create synthetic contact matrices, which are new contact matrices that have been generalised to new countries based on existing surveys.
Why do we need conmat
?
Existing methods only provide outputs of the contact matrices for each country, or at best, for urban and rural areas for a given country. We need methods that allow for flexibly creating synthetic contact matrices for a specified age population. This is because the age population distribution of many countries (e.g., Australia), are quite heterogeneous, and assuming it is homogeneous would result in inaccurate representation of community infection in many regions.
Suppose we want to get a contact matrix for a given region in Australia, let's say the city of Perth. We can get that from a helper function, abs_age_lga
.
perth <- abs_age_lga("Perth (C)") perth
(You can learn more about the data sources we provide in data-sources.Rmd
)
We can get a contact matrix made for perth
using the extrapolate_polymod
function:
perth_contact <- extrapolate_polymod( population = perth ) perth_contact
We can plot this with autoplot
autoplot(perth_contact)
And you can see each contact matrix in a setting by referring to its name - for example, the "home" setting contact matrix:
perth_contact$home
These contact matrices could then be used in subsequent modelling, such as the input in a SIR (Susceptible, Infected, Recovered) model.
Similarly to above, we can access some world data from another data source - we have some helpers to pull data from the world population:
world_data <- socialmixr::wpp_age() head(world_data)
We can tidy the data up, filtering down to a specified location and year with the age_population
function:
nz_2015 <- age_population( data = world_data, location_col = country, location = "New Zealand", age_col = lower.age.limit, year_col = year, year = 2015 ) nz_2015
Then we could create a contact matrix for NZ population data for 2015 like so:
nz_contact <- extrapolate_polymod( population = nz_2015 ) autoplot(nz_contact) nz_contact$home
From here you might want to:
See the vignette, "example pipeline" for more details.
The above example showed how we might extract a contact matrix based on the polymod data - this example now shows all the steps that can be taken for full flexibility, and provides more detail on the initial datasets that could be used.
First we want to fit the model to the POLYMOD data, which contains various survey and population data.
library(conmat) polymod_contact_data_home <- get_polymod_contact_data(setting = "home") polymod_survey_data <- get_polymod_population()
The contact data is a data frame containing the age from and to, and the number of contacts for each of the specified settings, "home", "work", "school", "other", or "all" as well as the number of participants. By default, polymod_contact_data
contains data from "all", but we're going to use the "work" set of data, as it produces an interesting looking dataset. Each row contains survey information of the number of contacts. Specifically, the number of contacts from one age group to another age group, and then the number of participants in that age group.
The survey data, polymod_survey_data
contains the lower age limit and the population in that age group.
polymod_survey_data
We also provide control over the POLYMOD data retrieved from get_polymod_contact_data()
via the arguments, setting
, country
, and ages
. These allow you to specify the data to be only from certain settings or countries or ages. See ?get_polymod_contact_data
for more details. Below is a brief example of this:
polymod_contact_data_belgium_0_10 <- get_polymod_contact_data( setting = "work", countries = "Belgium", ages = c(0, 5, 10) ) polymod_contact_data_belgium_0_10
Similarly, you can control the population data, retrieving it only for certain countries:
get_polymod_population(countries = "Belgium") get_polymod_population(countries = "Finland")
You can see the available countries in the helpfile with ?get_polymod_population
.
We can create a model of the contact rate with the function fit_single_contact_model
. We're first going to use some smaller sets of the data, to save on computation time.
set.seed(2022 - 10 - 04) polymod_contact_data_home_small <- polymod_contact_data_home %>% filter( age_from <= 30, age_to <= 30 ) polymod_survey_data_small <- polymod_survey_data %>% filter(lower.age.limit <= 30) contact_model <- fit_single_contact_model( contact_data = polymod_contact_data_home_small, population = polymod_survey_data_small )
This fits a generalised additive model (GAM), predicting the contact rate, based on a series of prediction terms that describe various features of the contact rates.
contact_model
We can use this contact model to then predict the contact rate in a new population.
As a demonstration, let's take an age population from a given LGA in Australia (this was the initial motivation for the package, so there are some helper functions for Australian specific data).
fairfield <- abs_age_lga("Fairfield (C)") fairfield
We can then pass the contact model through to predict_contacts
, along with the fairfield age population data, and some age breaks that we want to predict to. Note that these age breaks could be any size, we just have them set to 5 year age brackets in most of the examples, but these could be 1 year, 2 year, or even sub year.
set.seed(2022 - 10 - 04) synthetic_contact_fairfield <- predict_contacts( model = contact_model, population = fairfield, age_breaks = c(seq(0, 30, by = 5), Inf) ) synthetic_contact_fairfield
Let's visualise the matrix to get a sense of the predictions with autoplot
. First we need to transform the predictions to a matrix:
synthetic_contact_fairfield %>% predictions_to_matrix() %>% autoplot()
It is worth noting that the contact matrices created using this package are transposed when compared to the contact matrices discussed by Prem and Mossong. That is, the rows are "age group to", and the columns are "age group from".
Our experience has been that we would be fitting the same models to each setting when doing data analysis when using conmat. Accordingly, you can also fit a model for all of the settings all at once with the functions, fit_setting_contacts()
, and predict_setting_contacts()
. This means we can do the above, but for each setting, "home", "work", "school", "other", and "all", at once. If we want to use all of the POLYMOD data, we can also use the extrapolate_polymod()
function.
We can create a model for each of the settings with fit_setting_contacts()
.
set.seed(2021 - 09 - 24) polymod_setting_data <- get_polymod_setting_data() polymod_setting_data_small <- polymod_setting_data %>% lapply(FUN = function(x) x %>% filter(age_from <= 20, age_to <= 20)) |> new_setting_data() setting_models <- fit_setting_contacts( contact_data_list = polymod_setting_data_small, population = polymod_survey_data )
This contains a list of models, one for each setting. We can look at one, and get summary information out:
names(setting_models) setting_models$home
So this gives us our baseline model of a contact model. We have fit this model to the existing contact survey data. We can now predict to another age population, to create our "synthetic" contact matrix.
Then we take the model we had earlier, and extrapolate to the fairfield data with predict_setting_contacts
, also providing some age breaks we want to predict to
set.seed(2021 - 10 - 04) synthetic_settings_5y_fairfield <- predict_setting_contacts( population = fairfield, contact_model = setting_models, age_breaks = c(seq(0, 20, by = 5), Inf) )
This then returns a list of synthetic matrices, "home", "work", "school", "other", and "all", which is the sum of all matrices.
str(synthetic_settings_5y_fairfield) synthetic_settings_5y_fairfield$home synthetic_settings_5y_fairfield$all
We can use autoplot
to plot all at once
# this code is erroring for the moment - something to do with rendering a large plot I think. autoplot( synthetic_settings_5y_fairfield, title = "Setting-specific synthetic contact matrices (fairfield 2020 projected)" )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.