Introduction to socialmixr

  collapse = TRUE,
  comment = "#>"

socialmixr is an R package to derive social mixing matrices from survey data. These are particularly useful for age-structured infectious disease models. For background on age-specific mixing matrices and what data inform them, see, for example, the paper on by Mossong et al.


The latest stable version of the socialmixr package is installed via


The latest development version of the socialmixr package can be installed via


To load the package, use


At the heart of the socialmixr package is the contact_matrix function. This extracts a contact matrix from survey data. You can use the R help to find out about usage of the contact_matrix function, including a list of examples:


The POLYMOD data are included with the package and can be loaded using


An example use would be

contact_matrix(polymod, countries = "United Kingdom", age.limits = c(0, 1, 5, 15))

This generates a contact matrix from the UK part of the POLYMOD study, with age groups 0-1, 1-5, 5-15 and 15+ years. It contains the mean number of contacts that each member of an age group (row) has reported with members of the same or another age group (column).


The contact_matrix function requires a survey given as a list of two elements, both given as data.frames: participants and contacts. They must be linked by an ID column that refers to the identity of the queried participants (by default global_id, but this can be changed using the id.column argument). The participants data frame, as a minimum, must have the ID column and a column denoting participant age (which can be set by the part.age.column argument, by default participant_age). The contacts data frame, similarly, must have the ID column and a column denoting age (which can be set by the contact.age.column argument, by default cnt_age_mean).

The function then either randomly samples participants (if bootstrap is set to TRUE) or takes all participants in the survey and determines the mean number of contacts in each age group given the age group of the participant. The age groups can be set using the age.limits argument, which should be set to the lower limits of the age groups (e.g., age.limits=c(0, 5, 10) for age groups 0-5, 5-10, 10+). If these are not given, the narrowest age groups possible given survey and demographic data are used.


The key argument to the contact_matrix function is the survey that it supposed to use. The socialmixr package includes the POLYMOD survey, which will be used if not survey is specified. It also provides access to all surveys in the Social contact data community on Zenodo. The available surveys can be listed (if an internet connection is available) with


A survey can be downloaded using the get_survey command. This will get the relevant data of a survey given its Zenodo DOI (as returned by list_surveys). All other relevant commands in the socialmixr package accept a DOI, but if a survey is to be used repeatedly it is worth downloading it and storing it locally to avoid the need for a network connection and speed up processing.

peru_survey <- get_survey("")
saveRDS(peru_survey, "peru.rds")

This way, the peru data set can be loaded in the future without the need for an internet connection using

peru_survey <- readRDS("peru.rds")

Some surveys may contain data from multiple countries. To check this, use the survey_countries function


If one wishes to get a contact matrix for one or more specific countries, a countries argument can be passed to contact_matrix. If this is not done, the different surveys contained in a dataset are combined as if they were one single sample (i.e., not applying any population-weighting by country or other correction).

By default, socialmixr uses the POLYMOD survey. A reference for any given survey can be obtained using cite, e.g.



To get an idea of uncertainty of the contact matrices, a bootstrap can be used. If an argument n greater than 1 is passed to contact_matrix, multiple samples of contact matrices are generated. For each sample, participants are sampled (with replacement, to get the same number of participants of the original study), and contacts are sampled from the set of all the contacts of all the participants (again, with replacement). All resulting contact matrices are returned as matrices field in the returned list. From these, derived quantities can be obtained, for example the mean

m <- contact_matrix(polymod, countries = "United Kingdom", age.limits = c(0, 1, 5, 15), n=5)
mr <- Reduce("+", lapply(m$matrices, function(x) {x$matrix})) / length(m$matrices)


Obtaining symmetric contact matrices or splitting out their components (see below) requires information about the underlying demographic composition of the survey population. This can be passed to contact_matrix as the survey.pop argument, a data.frame with two columns, lower.age.limit (denoting the lower end of the age groups) and population (denoting the number of people in each age group). If no survey.pop is not given, contact_matrix will try to obtain the age structure of the population (as per the countries argument) from the World Population Prospects of the United Nations, using estimates from the year that closest matches the year in which the contact survey was conducted.

If demographic information is used, this is returned by contact_matrix as the demography field in the results list.

Symmetric contact matrices

Conceivably, contact matrices should be symmetric: the total number of contacts made by members of one age group with those of another should be the same as vice versa. Mathematically, if $c_{ij}$ is the mean number of contacts made by members of age group $i$ with members of age group $j$, and the total number of people in age group $i$ is $N_i$, then

$$c_{ij} N_i = c_{ji}N_j$$

Because of variation in the sample from which the contact matrix is obtained, this relationship is usually not fulfilled exactly. In order to obtain a symmetric contact matrix that fulfills it, one can use

$$c'_{ij} = \frac{1}{2N_i} (c_ij N_i + c_ji N_j)$$

To get this version of the contact matrix, use symmetric = TRUE when calling the contact_matrix function.

contact_matrix(polymod, countries = "United Kingdom", age.limits = c(0, 1, 5, 15), symmetric = TRUE)

Splitting contact matrices

The contact_matrix contains a simple model for the elements of the contact matrix, by which it is split into a global component, as well as three components representing contacts, assortativity and demography. In other words, the elements $c_{ij}$ of the contact matrix are modelled as

$$ c_{ij} = q d_i a_{ij} n_j $$

where $q d_i$ is the number of contacts that a member of group $i$ makes across age groups, $n_j$ is the proportion of the surveyed population in age group $j$. The constant $q$ is set to the value of the largest eigenvalue of $c_{ij}$; if used in an infectious disease model, it can be replaced by the basic reproduction number $R_0$.

To model the contact matrix in this way with the contact_matrix function, set split = TRUE. The components of the matrix are returned as elements normalisation ($q$), contacts ($d_i$), matrix ($a_{ij}$) and demography ($n_j$) of the resulting list.

contact_matrix(polymod, countries = "United Kingdom", age.limits = c(0, 1, 5, 15), split = TRUE)


The contact matrices can be plotted, for example, using the geom_tile function of the ggplot2 package.

df <- melt(mr, varnames = c("age1", "age2"), = "contacts")
ggplot(df, aes(x = age2, y = age1, fill = contacts)) + theme(legend.position="bottom") + geom_tile()

Try the socialmixr package in your browser

Any scripts or data that you put into this service are public.

socialmixr documentation built on Jan. 11, 2020, 9:31 a.m.