conmat: Builds Contact Matrices using GAMs and Population Data

title: 'conmat: generate synthetic contact matrices for a given age-stratified population' authors: - affiliation: 1 name: Nicholas Tierney orcid: 0000-0003-1460-8722 - affiliation: 1,3 name: Chitra Saraswati orcid: 0000-0002-8159-0414 - affiliation: 1,3 name: Aarathy Babu orcid: - affiliation: 4 name: Michael Lydeamore orcid: 0000-0001-6515-827X - affiliation: 1,2 name: Nick Golding orcid: 0000-0001-8916-5570 date: today bibliography: references.bib cite-method: biblatex tags: - epidemiology - R - infectious disease affiliations: - index: 1 name: Telethon Kids Institute - index: 2 name: Curtin University - index: 3 name: - index: 4 name: Monash University execute: echo: true cache: false format: pdf: keep-md: true fig-height: 4 fig-align: center fig-format: png dpi: 300 html: keep-md: true fig-height: 4 fig-align: center fig-format: png dpi: 300

::: {.cell}

:::

::: {.cell}

:::

Summary

Contact matrices describe the number of contacts between individuals. They are used to create models of infectious disease spread. conmat is an R package which generates synthetic contact matrices for arbitrary input demography, ready for use in infectious disease modelling.

There are currently few options for a user to access synthetic contact matrices [@socialmixr; @prem2017]. Existing code to generate synthetic contact matrices from @prem2017 are not designed for replicability, are restricted to select countries, and provide no sub-national demographic estimates.

The conmat package exposes model fitting and prediction separately to the user. Users can fit a model based on a contact survey such as POLYMOD [@mossong2008], then predict from this model to their own demographic data. This means users can generate synthetic contact matrices for any region, with any contact survey.

We demonstrate a use-case for conmat by creating contact matrices for sub-national level (in this case, a state) in Australia.

For users who do not wish to run the entire conmat pipeline, we have pre-generated synthetic contact matrices for 200 countries, based on a list of countries from the United Nations, using a model fit to the POLYMOD contact survey. These resulting synthetic contact matrices, and the associated code, can be found in the syncomat analysis pipeline (GitHub, Zenodo) [@syncomat].

Statement of need

Infectious diseases like influenza and COVID-19 spread via social contact. If we can understand patterns of contact---which individuals are more likely be in contact with each other---then we will be able to create models of how disease spreads. Epidemiologists and public policy makers can use these models to make decisions to keep a population safe and healthy.

Empirical estimates of social contact are provided by social contact surveys. These provide samples of the frequency and type of social contact across different settings (home, work, school, other).

A prominent contact survey is the POLYMOD study by @mossong2008, which surveyed 8 European countries: Belgium, Germany, Finland, Great Britain, Italy, Luxembourg, The Netherlands, and Poland [@mossong2008].

These social contact surveys can be projected on to a given demographic structure to produce estimated daily contact rates between age groups. These are known as 'synthetic' contact matrices. A widely used approach by @prem2017 [@prem2021] produced synthetic contact matrices for 177 countries at 'urban' and 'rural' levels for each country.

However, there were major limitations with the methods in @prem2021. First, not all countries were included in their analyses. Second, the contact matrices only covered broad population groups within entire countries. This presents challenges for decision makers who are often working at a sub-national geographical scale, with differing demographic structure in different sub-populations. Third, the code provided by Prem et al. was not designed for replicability and easy modification with user-defined inputs.

The conmat package was developed to fill the specific need of creating contact matrices for arbitrary age categories and populations (as shown in the below example) to inform infectious diease models. We developed the method primarily to output synthetic contact matrices. We also provided methods to create next generation matrices for modelling.

Example

We will generate a contact matrix for Tasmania, a state in Australia, using a model fitted from the POLYMOD contact survey. We can get the age-stratified population data for Tasmania from the Australian Bureau of Statistics (ABS) with the helper function, abs_age_state():

::: {.cell}

```{.r .cell-code} tasmania <- abs_age_state("TAS") head(tasmania)


::: {.cell-output .cell-output-stdout}

A tibble: 6 × 4 (conmat_population)

age: lower.age.limit
population: population year state lower.age.limit population 1 2020 TAS 0 29267 2 2020 TAS 5 31717 3 2020 TAS 10 33318 4 2020 TAS 15 31019 5 2020 TAS 20 31641 6 2020 TAS 25 34115



:::
:::



We can then generate a synthetic contact matrix for Tasmania, by extrapolating the contact patterns between age groups learned from the POLYMOD study, using `extrapolate_polymod()`.



::: {.cell}

```{.r .cell-code}
tasmania_contact <- extrapolate_polymod(population = tasmania)
tasmania_contact