knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 8, fig.height = 6, fig.path = "figs/", echo = TRUE )
To install the current stable, CRAN version of the package, type:
install.packages("incidence")
To benefit from the latest features and bug fixes, install the development, github version of the package using:
devtools::install_github("reconhub/incidence")
Note that this requires the package devtools installed.
The main features of the package include:
incidence
: compute incidence from dates in various formats; any fixed
time interval can be used; the returned object is an instance of the (S3)
class incidence.
plot
: this method (see ?plot.incidence
for details) plots incidence
objects, and can also add predictions of the model(s) contained in an
incidence_fit object (or a list of such objects).
fit
: fit one or two exponential models (i.e. linear regression on
log-incidence) to an incidence object; two models are calibrated only if a
date is provided to split the time series in two (argument split
); this is
typically useful to model the two phases of exponential growth, and decrease
of an outbreak; each model returned is an instance of the (S3) class
incidence_fit, each of which contains various useful information
(e.g. growth rate r, doubling/halving time, predictions and confidence
intervals); results can be plotted using plot
, or added to an existing
uncudence
plot using the piping-friendly function add_incidence_fit
.
fit_optim_split
: finds the optimal date to split the time series in two,
typically around the peak of the epidemic.
[
: lower-level subsetan of incidence objects, permiting to specify
which dates and groups to retain; uses a syntax similar to matrices,
i.e. x[i, j]
, where x
is the incidence object, i
a subset of dates,
and j
a subset of groups.
subset
: subset an incidence object by specifying a time window.
pool
: pool incidence from different groups into one global incidence
time series.
cumulate
: computes cumulative incidence over time from and incidence
object.
as.data.frame
: converts an incidence object into a data.frame
containing dates and incidence values.
bootstrap
: generates a bootstrapped incidence object by re-sampling,
with replacement, the original dates of events.
find_peak
: locates the peak time of the epicurve.
estimate_peak
: uses bootstrap to estimate the peak time (and related
confidence interval) of a partially observed outbreak.
An overview of incidence is provided below in the worked example below. More detailed tutorials are distributed as vignettes with the package:
vignette("overview", package="incidence") vignette("customize_plot", package="incidence") vignette("incidence_class", package="incidence") vignette("incidence_fit_class", package="incidence") vignette("conversions", package="incidence")
The following websites are available:
The official incidence website, providing an overview of the package's functionalities, up-to-date tutorials and documentation:
https://www.repidemicsconsortium.org/incidence
The incidence project on github, useful for developers, contributors, and users wanting to post issues, bug reports and feature requests:
https://github.com/reconhub/incidence
The incidence page on CRAN:
https://CRAN.R-project.org/package=incidence
Bug reports and feature requests should be posted on github using the issue system. All other questions should be posted on the RECON forum:
https://www.repidemicsconsortium.org/forum/
The following worked example provides a brief overview of the package's functionalities. See the vignettes section for more detailed tutorials.
This example uses the simulated Ebola Virus Disease (EVD) outbreak from the package outbreaks. We will compute incidence for various time steps, calibrate two exponential models around the peak of the epidemic, and analyse the results.
First, we load the data:
library(outbreaks) library(ggplot2) library(incidence) dat <- ebola_sim$linelist$date_of_onset class(dat) head(dat)
We compute the weekly incidence:
i.7 <- incidence(dat, interval = 7) i.7 plot(i.7)
incidence
can also compute incidence by specified groups using the groups
argument. For instance, we can compute the weekly incidence by gender:
i.7.sex <- incidence(dat, interval = "week", groups = ebola_sim$linelist$gender) i.7.sex plot(i.7.sex, stack = TRUE, border = "grey")
incidence
objectsincidence
objects can be manipulated easily. The [
operator implements
subetting of dates (first argument) and groups (second argument). For
instance, to keep only the first 20 weeks of the epidemic:
i.7[1:20] plot(i.7[1:20])
Some temporal subsetting can be even simpler using subset
, which permits to
retain data within a specified time window:
i.tail <- subset(i.7, from = as.Date("2015-01-01")) i.tail plot(i.tail, border = "white")
Subsetting groups can also matter. For instance, let's try and visualise the incidence based on onset of symptoms by outcome:
i.7.outcome <- incidence(dat, "week", groups = ebola_sim$linelist$outcome) i.7.outcome plot(i.7.outcome, stack = TRUE, border = "grey")
To visualise the cumulative incidence:
i.7.outcome.c <- cumulate(i.7.outcome) i.7.outcome.c plot(i.7.outcome.c)
Groups can also be collapsed into a single time series using pool
:
i.pooled <- pool(i.7.outcome) i.pooled identical(i.7$counts, i.pooled$counts)
Incidence data, excluding zeros, can be modelled using log-linear regression of the form: log(y) = r x t + b
where y is the incidence, r is the growth rate, t is the number of days since a specific point in time (typically the start of the outbreak), and b is the intercept.
Such model can be fitted to any incidence object using fit
.
Of course, a single log-linear model is not sufficient for modelling our time series, as there is clearly an growing and a decreasing phase.
As a start, we can calibrate a model on the first 20 weeks of the epidemic:
plot(i.7[1:20]) early.fit <- fit(i.7[1:20]) early.fit
The resulting objects can be plotted, in which case the prediction and its confidence interval is displayed:
plot(early.fit)
However, a better way to display these predictions is adding them to the
incidence plot using the argument fit
:
plot(i.7[1:20], fit = early.fit)
Alternatively, these can be piped using:
library(magrittr) plot(i.7[1:20]) %>% add_incidence_fit(early.fit)
In this case, we would ideally like to fit two models, before and after the peak of the epidemic. This is possible using the following approach, in which the best possible splitting date (i.e. the one maximizing the average fit of both models), is determined automatically:
best.fit <- fit_optim_split(i.7) best.fit plot(i.7, fit = best.fit$fit)
See details of contributions on:
https://github.com/reconhub/incidence/graphs/contributors
Contributions are welcome via pull requests.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Maintainer: Zhian N. Kamvar (zkamvar@gmail.com)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.