Overview of the incidence package

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>", 
  fig.width=7, 
  fig.height=5
)

incidence implements functions and classes to compute, handle, visualise and model incidences from dates data. This vignette provides an overview of current features. It largely reproduces the content of REAME.md.


Installing the package

To install the current stable, CRAN version of the package, type:

install.packages("incidence")

To benefit from the latest features and bug fixes, install the development, github version of the package using:

devtools::install_github("reconhub/incidence")

Note that this requires the package devtools installed.


What does it do?

The main functions of the package include:

Worked example: simulated Ebola outbreak

Loading the data

This example uses the simulated Ebola Virus Disease (EVD) outbreak from the package outbreaks. We will compute incidence for various time steps, calibrate two exponential models around the peak of the epidemic, and analyse the results.

First, we load the data:

library(outbreaks)
library(ggplot2)
library(incidence)

dat <- ebola_sim$linelist$date_of_onset
class(dat)
head(dat)

Computing and plotting incidence

We compute the daily incidence:

i <- incidence(dat)
i
plot(i)

The daily incidence is quite noisy, but we can easily compute other incidence using larger time intervals:

# weekly, starting on Monday (ISO week, default)
i.7 <- incidence(dat, interval = "1 week")
plot(i.7)

# semi-weekly, starting on Saturday
i.14 <- incidence(dat, interval = "2 saturday weeks")
plot(i.14, border = "white")

## monthly
i.month <- incidence(dat, interval = "1 month")
plot(i.month, border = "white")

incidence can also compute incidence by specified groups using the groups argument. For instance, we can compute incidence by gender:

i.7.sex <- incidence(dat, interval = "1 week", groups = ebola_sim$linelist$gender)
i.7.sex
plot(i.7.sex, stack = TRUE, border = "grey")

We can do the same for hospitals, using the 'clean' version of the dataset, with some customization of the legend:

i.7.hosp <- with(ebola_sim_clean$linelist, 
     incidence(date_of_onset, interval = "week", groups = hospital))
i.7.hosp
head(get_counts(i.7.hosp))
plot(i.7.hosp, stack=TRUE) + 
    theme(legend.position= "top") + 
    labs(fill="")

Handling incidence objects

incidence objects can be manipulated easily. The [ operator implements subsetting of dates (first argument) and groups (second argument). For instance, to keep only the peak of the distribution:

i[100:250]
plot(i[100:250])

Or to keep every other week:

i.7[c(TRUE,FALSE)]
plot(i.7[c(TRUE,FALSE)])

Some temporal subsetting can be even simpler using subset, which permits to retain data within a specified time window:

i.tail <- subset(i, from=as.Date("2015-01-01"))
i.tail
plot(i.tail, border="white")

Subsetting groups can also matter. For instance, let's try and visualise the incidence based on onset of symptoms by outcome:

i.7.outcome <- incidence(dat, 7, groups=ebola_sim$linelist$outcome)
i.7.outcome
plot(i.7.outcome, stack = TRUE, border = "grey")

By default, incidence treats missing data (NA) as a separate group (see argument na_as_group). We could disable this to retain only known outcomes, but alternatively we can simply subset the object to exclude the last (3rd) group:

i.7.outcome[,1:2]
plot(i.7.outcome[,1:2], stack = TRUE, border = "grey")

Groups can also be collapsed into a single time series using pool:

i.pooled <- pool(i.7.outcome)
i.pooled
identical(i.7$counts, i.pooled$counts)

Modelling incidence

Incidence data, excluding zeros, can be modelled using log-linear regression of the form: log(y) = r x t + b

where y is the incidence, r is the growth rate, t is the number of days since a specific point in time (typically the start of the outbreak), and b is the intercept.

Such model can be fitted to any incidence object using fit. Of course, a single log-linear model is not sufficient for modelling our epidemic curve, as there is clearly an growing and a decreasing phase. As a start, we can calibrate a model on the first 20 weeks of the epidemic:

plot(i.7[1:20])
early.fit <- fit(i.7[1:20])
early.fit

The resulting objects (known as incidence_fit objects) can be plotted, in which case the prediction and its confidence interval is displayed:

plot(early.fit)

However, a better way to display these predictions is adding them to the incidence plot using the argument fit:

plot(i.7[1:20], fit = early.fit)

In this case, we would ideally like to fit two models, before and after the peak of the epidemic. This is possible using the following approach, if you know what date to use to split the data in two phases:

fit.both <- fit(i.7, split=as.Date("2014-10-15"))
fit.both
plot(i.7, fit=fit.both)

This is much better, but the splitting date is not completely optimal. To look for the best possible splitting date (i.e. the one maximizing the average fit of both models), we use:

best.fit <- fit_optim_split(i.7)
best.fit
plot(i.7, fit=best.fit$fit)

These models are very good approximation of these data, showing a doubling time of r round(get_info(best.fit$fit, "doubling"), 1) days during the first phase, and a halving time of r round(get_info(best.fit$fit, "halving"), 1) days during the second. To access these parameters, you can use the get_info() function.

The possible parameters are:

For "r", "doubling", and "halving", you can also add ".conf" to get the confidence intervals. Here's how you can get the doubling and halving times of the above epi curve:

get_info(best.fit$fit, "doubling")      # doubling time
get_info(best.fit$fit, "doubling.conf") # confidence interval
get_info(best.fit$fit, "halving")       
get_info(best.fit$fit, "halving.conf")       

Note that fit will also take groups into account if incidence has been computed for several groups:

best.fit2 <- fit_optim_split(i.7.sex)
best.fit2
plot(i.7.sex, fit=best.fit2$fit)

Using get_info() on this fit object will return all groups together:

get_info(best.fit2$fit, "doubling")      # doubling time
get_info(best.fit2$fit, "doubling.conf") # confidence interval
get_info(best.fit2$fit, "halving")       
get_info(best.fit2$fit, "halving.conf")       


Try the incidence package in your browser

Any scripts or data that you put into this service are public.

incidence documentation built on Nov. 8, 2020, 4:30 p.m.