`epicontacts`: Manipulation, Visualisation and Analysis of Epidemiological Contact Data

library(knitr)
opts_chunk$set(message=FALSE, warning=FALSE, eval=TRUE, echo=TRUE)

Introduction

epicontacts aims to facilitate manipulation, visualisation and analysis of epidemiological contact data. Such datasets inherently have network components, in which nodes are typically cases and reported contacts or exposures are (directed or undirected) edges. This package provides a convenient data structure as well as functionality specific to handle these data.

library(outbreaks)
library(epicontacts)

Loading Data

epicontacts provides a convenient structure to store heterogeneous epidemiological contact network data (i.e. nodes and edges) in a single object. The epicontacts class must contain two components: a line list and a contact dataset.

Each row of the line list should represent unique observations of cases, and each row of the contact list should represent unique pairs of contacts. Each can include arbitrary features, but both datasets should share an identification scheme.

Example Dataset

The example that follows will use the mers_korea_2015, which is a dataset (in list format) distributed in the outbreaks package.

str(mers_korea_2015)

What features are in the line list?

colnames((mers_korea_2015$linelist))

What about the contact dataset?

colnames((mers_korea_2015$contacts))

Creating epicontacts Object

In order to create the epicontacts object, both the line list and contact data frames must be passed to make_epicontacts(). This function accommodates instances when the respective identifiers are not the first columns of these data frames (see the "id", "from" and "to" arguments). make_epicontacts() can also account for contact networks that have a direction (see "directed" argument).

merskor15 <- make_epicontacts(linelist = mers_korea_2015$linelist,
                              contacts = mers_korea_2015$contacts, 
                              directed = FALSE)
class(merskor15)
summary(merskor15)

Data Manipulation

Access Unique Identifiers

The summary() method above provided counts for the number unique cases in both the contact and line list. The get_id() function retrieves similar information but as vectors of identifiers. This can be parameterized as follows:

What are the first ten IDs in the contacts dataset?

contacts_ids <- get_id(merskor15, "contacts")
head(contacts_ids, n = 10)

How many IDs are common to both?

length(get_id(merskor15, "common"))

Subsetting

The subset() method for epicontacts objects allows for, among other things, pruning of networks based on values of node and edge attributes. These values must be passed as named lists to the respective argument.

subset(merskor15, node_attribute = list("outcome" = "Dead", "sex" = "M"), 
       edge_attribute = list("exposure" = "Emergency room"))

In addition to subsetting by node and edge attributes, networks can be pruned to only include components that are connected to certain nodes. The "id" argument takes a vector of nodes and returns the line list of individuals that "touch" those IDs.

nodes <- c("SK_14","SK_145")                  
subset(merskor15, cluster_id = nodes)

The subset() method for epicontacts objects also accepts cluster size parameters (see "cs", "cs_min" and "cs_max" arguments).

subset(merskor15, cs = 3)
subset(merskor15, cs_min = 10, cs_max = 100)

Visualisation

Default Plotting Method

One of the main features of epicontacts is its visualisation capabilities. As a default, the package uses interactive plotting based on the visNetwork package^2. This interactivity is particularly useful for visualising large datasets.

plot(merskor15) 

The above is a generic method based on the vis_epicontacts() and accepts a number of arguments to customize the plot appearance and functionality. For a full list of options use ?vis_epicontacts(). For instance, one can customize nodes using colors and icons:

plot(merskor15, "place_infect", node_shape = "sex",
     shapes = c(M = "male", F = "female")) 

See codeawesome to see available shapes.

Alternatively, the method used for plotting can be graph3D, in which case a 3-dimensional graph will be used (see below).

3D plots

epicontacts loads the threejs package to enable 3D visualisation tools with the graph3D() function^3.

graph3D(merskor15, node_color = "sex", g_title = "MERS Korea 2014")

To interact with the plot:

Analysis

Extract Characteristics of Pairwise Nodes

The get_pairwise() function allows processing of variable(s) in the line list according to each pair in the contact dataset. For the following example, date of onset of disease is extracted from the line list in order to compute the difference between disease date of onset for each pair. The value that is produced from this comparison represents the serial interval (si).

si <- get_pairwise(merskor15, "dt_onset")   
summary(si)
hist(si, col="grey", border="white", xlab="Days after symptoms",
     main="MERS Korea 2014\nSerial Interval")

The get_pairwise() will interpret the class of the column being used for comparison, and will adjust its method of comparing the values accordingly. For numbers and dates (like the si example above), the function will subtract the values. When applied to columns that are characters or categorical, get_pairwise() will paste values together. Because the function also allows for arbitrary processing (see "f" argument), these discrete combinations can be easily tabulated and analyzed.

head(get_pairwise(merskor15, "sex"), n = 10)
get_pairwise(merskor15, "sex", f=table)
fisher.test(get_pairwise(merskor15, "sex", f=table)) 

Identify Clusters

The get_clusters() function can be used for to identify connected components in an epicontacts object. Here, we illustrate its use to study contact patterns in a simulated Ebola outbreak. First, we use it to retrieve data.frame containing the cluster information:

x <- make_epicontacts(ebola_sim$linelist, ebola_sim$contacts,
                      id = "case_id", to = "case_id", from = "infector",
                      directed = TRUE)
x

clust <- get_clusters(x, output = "data.frame")
class(clust)
dim(clust)
table(clust$cluster_size)
barplot(table(clust$cluster_size),
        main = "Cluster size distribution",
    xlab = "Cluster size",
    ylab = "Frequency")

Let us look at the largest clusters. For this, we add cluster information to the epicontacts object, and then subset it:

x <- get_clusters(x)
x_14 <- subset(x, cs = 14)
plot(x_14, "cluster_member")

Calculate Degree

The degree of a node corresponds to its number of edges or connections to other nodes. get_degree() provides an easy method for calculating this value for epicontacts networks. A high degree in this context indicates an individual who was in contact with many others.

nb use of "type" argument depends on whether or not the network is directed.

deg_both <- get_degree(merskor15, "both", only_linelist = TRUE)

Which individuals have the ten most contacts?

head(sort(deg_both, decreasing = TRUE), 10)

What is the mean number of contacts?

mean(deg_both)

References



Try the epicontacts package in your browser

Any scripts or data that you put into this service are public.

epicontacts documentation built on May 2, 2019, 11:29 a.m.