In njtierney/conmat: Builds Contact Matrices using GAMs and Population Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(conmat)
library(socialmixr)
library(ggplot2)
library(dplyr)
library(tidyr)
library(mgcv)
library(patchwork)

The goal of conmat is to simplify the process of generating synthetic contact matrices for a given age population.

What is a contact matrix?

Contact matrices describe the degree of contact between individuals of given age groups. For example, this matrix describes the number of contacts between individuals.

#| label: show-contact
#| echo: FALSE

name_vec <- c("0-4", "5-9", "10-14")
cmat <- matrix(
  data = rep(NA, 9),
  nrow = 3,
  ncol = 3,
  byrow = TRUE,
  dimnames = list(
    name_vec,
    name_vec
  )
)

diag(cmat) <- c(10, 11, 13)
cmat[upper.tri(cmat)] <- 3:5
cmat[lower.tri(cmat)] <- 3:5

cmat

The rows and columns represent the age groups of the people. On the main diagonal we see that we have a higher number of contacts - showing that people of similar ages tend to interact more with one another. We can use the information in these matrices to model how diseases such as COVID-19 spread in a population through social contact.

Why do we need synthetic contact matrices?

Contact matrices are produced from empirical data resulting from a contact survey, which requires individuals to diary the amount and manner of contact a person has in a day. However, these surveys are highly time-consuming and expensive to run, meaning that only a handful of these empirical datasets exist globally.

We can use statistical methods to create synthetic contact matrices, which are new contact matrices that have been generalised to new countries based on existing surveys.

Why do we need conmat?

Existing methods only provide outputs of the contact matrices for each country, or at best, for urban and rural areas for a given country. We need methods that allow for flexibly creating synthetic contact matrices for a specified age population. This is because the age population distribution of many countries (e.g., Australia), are quite heterogeneous, and assuming it is homogeneous would result in inaccurate representation of community infection in many regions.

Quick example using Australian data

Suppose we want to get a contact matrix for a given region in Australia, let's say the city of Perth. We can get that from a helper function, abs_age_lga.

perth <- abs_age_lga("Perth (C)")
perth

(You can learn more about the data sources we provide in data-sources.Rmd)

We can get a contact matrix made for perth using the extrapolate_polymod function:

perth_contact <- extrapolate_polymod(
  population = perth
)

perth_contact

We can plot this with autoplot

autoplot(perth_contact)

And you can see each contact matrix in a setting by referring to its name - for example, the "home" setting contact matrix:

perth_contact$home

These contact matrices could then be used in subsequent modelling, such as the input in a SIR (Susceptible, Infected, Recovered) model.

Quick example using world data

Similarly to above, we can access some world data from another data source - we have some helpers to pull data from the world population:

world_data <- socialmixr::wpp_age()

head(world_data)

We can tidy the data up, filtering down to a specified location and year with the age_population function:

nz_2015 <- age_population(
  data = world_data,
  location_col = country,
  location = "New Zealand",
  age_col = lower.age.limit,
  year_col = year,
  year = 2015
)

nz_2015

Then we could create a contact matrix for NZ population data for 2015 like so:

nz_contact <- extrapolate_polymod(
  population = nz_2015
)
autoplot(nz_contact)
nz_contact$home

What next?

From here you might want to:

Create a next generation matrix (NGM)
Apply vaccination to an NGM

See the vignette, "example pipeline" for more details.

A More in depth example

The above example showed how we might extract a contact matrix based on the polymod data - this example now shows all the steps that can be taken for full flexibility, and provides more detail on the initial datasets that could be used.

First we want to fit the model to the POLYMOD data, which contains various survey and population data.

library(conmat)
polymod_contact_data_home <- get_polymod_contact_data(setting = "home")
polymod_survey_data <- get_polymod_population()

The contact data is a data frame containing the age from and to, and the number of contacts for each of the specified settings, "home", "work", "school", "other", or "all" as well as the number of participants. By default, polymod_contact_data contains data from "all", but we're going to use the "work" set of data, as it produces an interesting looking dataset. Each row contains survey information of the number of contacts. Specifically, the number of contacts from one age group to another age group, and then the number of participants in that age group.

The survey data, polymod_survey_data contains the lower age limit and the population in that age group.

polymod_survey_data

We also provide control over the POLYMOD data retrieved from get_polymod_contact_data() via the arguments, setting, country, and ages. These allow you to specify the data to be only from certain settings or countries or ages. See ?get_polymod_contact_data for more details. Below is a brief example of this:

polymod_contact_data_belgium_0_10 <- get_polymod_contact_data(
  setting = "work",
  countries = "Belgium",
  ages = c(0, 5, 10)
)

polymod_contact_data_belgium_0_10

Similarly, you can control the population data, retrieving it only for certain countries:

get_polymod_population(countries = "Belgium")
get_polymod_population(countries = "Finland")

You can see the available countries in the helpfile with ?get_polymod_population.

Predicting the contact rate

We can create a model of the contact rate with the function fit_single_contact_model. We're first going to use some smaller sets of the data, to save on computation time.

set.seed(2022 - 10 - 04)
polymod_contact_data_home_small <- polymod_contact_data_home %>%
  filter(
    age_from <= 30,
    age_to <= 30
  )

polymod_survey_data_small <- polymod_survey_data %>%
  filter(lower.age.limit <= 30)

contact_model <- fit_single_contact_model(
  contact_data = polymod_contact_data_home_small,
  population = polymod_survey_data_small
)

This fits a generalised additive model (GAM), predicting the contact rate, based on a series of prediction terms that describe various features of the contact rates.

contact_model

We can use this contact model to then predict the contact rate in a new population.

As a demonstration, let's take an age population from a given LGA in Australia (this was the initial motivation for the package, so there are some helper functions for Australian specific data).

fairfield <- abs_age_lga("Fairfield (C)")
fairfield

We can then pass the contact model through to predict_contacts, along with the fairfield age population data, and some age breaks that we want to predict to. Note that these age breaks could be any size, we just have them set to 5 year age brackets in most of the examples, but these could be 1 year, 2 year, or even sub year.

set.seed(2022 - 10 - 04)
synthetic_contact_fairfield <- predict_contacts(
  model = contact_model,
  population = fairfield,
  age_breaks = c(seq(0, 30, by = 5), Inf)
)

synthetic_contact_fairfield

Plotting

Let's visualise the matrix to get a sense of the predictions with autoplot. First we need to transform the predictions to a matrix:

synthetic_contact_fairfield %>%
  predictions_to_matrix() %>%
  autoplot()

Note

It is worth noting that the contact matrices created using this package are transposed when compared to the contact matrices discussed by Prem and Mossong. That is, the rows are "age group to", and the columns are "age group from".

Applying the model across all settings.

Our experience has been that we would be fitting the same models to each setting when doing data analysis when using conmat. Accordingly, you can also fit a model for all of the settings all at once with the functions, fit_setting_contacts(), and predict_setting_contacts(). This means we can do the above, but for each setting, "home", "work", "school", "other", and "all", at once. If we want to use all of the POLYMOD data, we can also use the extrapolate_polymod() function.

Fit to all settings

We can create a model for each of the settings with fit_setting_contacts().

set.seed(2021 - 09 - 24)

polymod_setting_data <- get_polymod_setting_data()

polymod_setting_data_small <- polymod_setting_data %>%
  lapply(FUN = function(x) x %>% filter(age_from <= 20, age_to <= 20)) |> 
  new_setting_data()

setting_models <- fit_setting_contacts(
  contact_data_list = polymod_setting_data_small,
  population = polymod_survey_data
)

This contains a list of models, one for each setting. We can look at one, and get summary information out:

names(setting_models)
setting_models$home

So this gives us our baseline model of a contact model. We have fit this model to the existing contact survey data. We can now predict to another age population, to create our "synthetic" contact matrix.

Predict to all settings

Then we take the model we had earlier, and extrapolate to the fairfield data with predict_setting_contacts, also providing some age breaks we want to predict to

set.seed(2021 - 10 - 04)
synthetic_settings_5y_fairfield <- predict_setting_contacts(
  population = fairfield,
  contact_model = setting_models,
  age_breaks = c(seq(0, 20, by = 5), Inf)
)

This then returns a list of synthetic matrices, "home", "work", "school", "other", and "all", which is the sum of all matrices.

str(synthetic_settings_5y_fairfield)
synthetic_settings_5y_fairfield$home
synthetic_settings_5y_fairfield$all

We can use autoplot to plot all at once

# this code is erroring for the moment - something to do with rendering a large plot I think.
autoplot(
  synthetic_settings_5y_fairfield,
  title = "Setting-specific synthetic contact matrices (fairfield 2020 projected)"
)

njtierney/conmat documentation built on April 17, 2025, 10:27 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com