library(knitr) opts_chunk$set(message=FALSE, warning=FALSE, eval=TRUE, echo=TRUE)
epicontacts
aims to facilitate manipulation, visualisation and analysis of
epidemiological contact data. Such datasets inherently have network components,
in which nodes are typically cases and reported contacts or exposures are
(directed or undirected) edges. This package provides a convenient data
structure as well as functionality specific to handle these data.
library(outbreaks) library(epicontacts)
epicontacts
provides a convenient structure to store heterogeneous
epidemiological contact network data (i.e. nodes and edges) in a single
object. The epicontacts
class must contain two components: a line list and a
contact dataset.
Each row of the line list should represent unique observations of cases, and each row of the contact list should represent unique pairs of contacts. Each can include arbitrary features, but both datasets should share an identification scheme.
The example that follows will use the mers_korea_2015
, which is a dataset (in
list format) distributed in the outbreaks package.
str(mers_korea_2015)
What features are in the line list?
colnames((mers_korea_2015$linelist))
What about the contact dataset?
colnames((mers_korea_2015$contacts))
epicontacts
ObjectIn order to create the epicontacts
object, both the line list and contact data
frames must be passed to make_epicontacts()
. This function accommodates
instances when the respective identifiers are not the first columns of these
data frames (see the "id", "from" and "to" arguments). make_epicontacts()
can
also account for contact networks that have a direction (see "directed"
argument).
merskor15 <- make_epicontacts(linelist = mers_korea_2015$linelist, contacts = mers_korea_2015$contacts, directed = FALSE) class(merskor15) summary(merskor15)
The summary()
method above provided counts for the number unique cases in both
the contact and line list. The get_id()
function retrieves similar information
but as vectors of identifiers. This can be parameterized as follows:
What are the first ten IDs in the contacts dataset?
contacts_ids <- get_id(merskor15, "contacts") head(contacts_ids, n = 10)
How many IDs are common to both?
length(get_id(merskor15, "common"))
The subset()
method for epicontacts
objects allows for, among other things,
pruning of networks based on values of node and edge attributes. These values
must be passed as named lists to the respective argument.
subset(merskor15, node_attribute = list("outcome" = "Dead", "sex" = "M"), edge_attribute = list("exposure" = "Emergency room"))
In addition to subsetting by node and edge attributes, networks can be pruned to only include components that are connected to certain nodes. The "id" argument takes a vector of nodes and returns the line list of individuals that "touch" those IDs.
nodes <- c("SK_14","SK_145") subset(merskor15, cluster_id = nodes)
The subset()
method for epicontacts
objects also accepts cluster size
parameters (see "cs", "cs_min" and "cs_max" arguments).
subset(merskor15, cs = 3) subset(merskor15, cs_min = 10, cs_max = 100)
One of the main features of epicontacts
is its visualisation capabilities. As
a default, the package uses interactive plotting based on the visNetwork
package^2. This interactivity is particularly useful for visualising large
datasets.
plot(merskor15)
The above is a generic method based on the vis_epicontacts()
and accepts a
number of arguments to customize the plot appearance and functionality. For a
full list of options use ?vis_epicontacts()
. For instance, one can customize
nodes using colors and icons:
plot(merskor15, "place_infect", node_shape = "sex", shapes = c(M = "male", F = "female"))
See codeawesome
to see available shapes.
Alternatively, the method
used
for plotting can be graph3D
, in which case a 3-dimensional graph will be used
(see below).
epicontacts
loads the threejs
package to enable 3D visualisation tools with
the graph3D()
function^3.
graph3D(merskor15, node_color = "sex", g_title = "MERS Korea 2014")
To interact with the plot:
The get_pairwise()
function allows processing of variable(s) in the line list
according to each pair in the contact dataset. For the following example, date
of onset of disease is extracted from the line list in order to compute the
difference between disease date of onset for each pair. The value that is
produced from this comparison represents the serial interval (si).
si <- get_pairwise(merskor15, "dt_onset") summary(si) hist(si, col="grey", border="white", xlab="Days after symptoms", main="MERS Korea 2014\nSerial Interval")
The get_pairwise()
will interpret the class of the column being used for
comparison, and will adjust its method of comparing the values accordingly. For
numbers and dates (like the si example above), the function will subtract
the values. When applied to columns that are characters or categorical,
get_pairwise()
will paste values together. Because the function also allows
for arbitrary processing (see "f" argument), these discrete combinations can be
easily tabulated and analyzed.
head(get_pairwise(merskor15, "sex"), n = 10) get_pairwise(merskor15, "sex", f=table) fisher.test(get_pairwise(merskor15, "sex", f=table))
The get_clusters()
function can be used for to identify connected components
in an epicontacts
object. Here, we illustrate its use to study contact
patterns in a simulated Ebola outbreak. First, we use it to retrieve
data.frame
containing the cluster information:
x <- make_epicontacts(ebola_sim$linelist, ebola_sim$contacts, id = "case_id", to = "case_id", from = "infector", directed = TRUE) x clust <- get_clusters(x, output = "data.frame") class(clust) dim(clust) table(clust$cluster_size) barplot(table(clust$cluster_size), main = "Cluster size distribution", xlab = "Cluster size", ylab = "Frequency")
Let us look at the largest clusters. For this, we add cluster information to the
epicontacts
object, and then subset it:
x <- get_clusters(x) x_14 <- subset(x, cs = 14) plot(x_14, "cluster_member")
The degree of a node corresponds to its number of edges or connections to other
nodes. get_degree()
provides an easy method for calculating this value for
epicontacts
networks. A high degree in this context indicates an individual
who was in contact with many others.
nb use of "type" argument depends on whether or not the network is directed.
deg_both <- get_degree(merskor15, "both", only_linelist = TRUE)
Which individuals have the ten most contacts?
head(sort(deg_both, decreasing = TRUE), 10)
What is the mean number of contacts?
mean(deg_both)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.