knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
This package implements small helper functions usefull in infectious disease modelling and epidemics analysis.
To install the current stable, CRAN version of the package, type:
install.packages("epitrix")
To benefit from the latest features and bug fixes, install the development, github version of the package using:
devtools::install_github("reconhub/epitrix")
Note that this requires the package devtools installed.
The main features of the package include:
gamma_shapescale2mucv
: convert shape and scale of a Gamma distribution
to mean and CV
gamma_mucv2shapescale
: convert mean and CV of a Gamma distribution to
shape and scale
gamma_log_likelihood
: Gamma log-likelihood using mean and CV
r2R0
: convert growth rate into a reproduction number
lm2R0_sample
: generates a distribution of R0 from a log-incidence linear
model
fit_disc_gamma
: fits a discretised Gamma distribution to data (typically
useful for describing delays)
clean_labels
: generate portable labels by removing non-standard characters or
replacing them with their closest alphanumeric matches, standardising
separators, etc.
hash_names
: generate unique, anonymised, reproducible labels from
various data fields (e.g. First name, Last name, Date of birth).
emperical_incubation_dist()
will estimate the empirical incubation
distribution if given a data frame with dates of onset and a range of
exposure dates.
fit_gamma_incubation_dist()
wraps empirical_incubation_dist()
and
fit_disc_gamma()
to fit a discretized gamma distribution to an empirical
incubation distribution
AR2R0()
calculates the R0 corresponding to a give attack rate
R02AR()
calculates the attack rate corresponding to a give R0
R02herd_immunity_threshold()
calculates the herd immunity threshold for
a given R0
sim_linelist()
simulates a simple linelist (with no epi model implied)
data.frame
which can be used for illustrating other functions
In this example, we simulate data which replicate the serial interval (SI), i.e. the delays between primary and secondary symptom onsets, in Ebola Virus Disease (EVD). We start by converting previously estimates of the mean and standard deviation of the SI (WHO Ebola Response Team (2014) NEJM 371:1481–1495) to the parameters of a Gamma distribution:
library(epitrix) mu <- 15.3 # mean in days days sigma <- 9.3 # standard deviation in days cv <- sigma / mu # coefficient of variation cv param <- gamma_mucv2shapescale(mu, cv) # convertion to Gamma parameters param
The shape and scale are parameters of a Gamma distribution we can use to
generate delays. However, delays are typically reported per days, which implies
a discretisation (from continuous time to discrete numbers). We use the package
distcrete to achieve this
discretisation. It generates a list of functions, including one to simulate data
($r
), which we use to simulate 500 delays:
si <- distcrete::distcrete("gamma", interval = 1, shape = param$shape, scale = param$scale, w = 0) si set.seed(1) x <- si$r(500) head(x, 10) hist(x, col = "grey", border = "white", xlab = "Days between primary and secondary onset", main = "Simulated serial intervals")
x
contains simulated data, for illustrative purpose. In practice, one would
use real data from an ongoing outbreaks. Now we use fit_disc_gamma
to estimate
the parameters of a dicretised Gamma distribution from the data:
si_fit <- fit_disc_gamma(x) si_fit
The package incidence can fit a
log-linear model to incidence curves (function fit
), which produces a growth
rate (r). This growth rate can in turn be translated into a basic reproduction
number (R0) using r2R0
. We illustrate this using simulated Ebola data from the
outbreaks package, and using the
serial interval from the previous example:
library(outbreaks) library(incidence) i <- incidence(ebola_sim$linelist$date_of_onset) i f <- fit(i[1:150]) # fit on first 150 days plot(i[1:200], fit = f, color = "#9fc2fc") r2R0(f$info$r, si$d(1:100)) r2R0(f$info$r.conf, si$d(1:100))
In addition, we can also use the function lm2R0_sample
to generate samples of
R0 values compatible with a model fit:
R0_val <- lm2R0_sample(f$model, si$d(1:100), n = 100) head(R0_val) hist(R0_val, col = "grey", border = "white")
If you want to use labels that will work across different computers, independent
of local encoding and operating systems, clean_labels
will make your life
easier. The function transforms character strings by replacing diacritic symbols
with their closest alphanumeric matches, setting all characters to lower case,
and replacing various separators with a single, consistent one.
For instance:
x <- " Thîs- is A wêïrD LäBeL .." x clean_labels(x) variables <- c("Date.of.ONSET ", "/ date of hôspitalisation /", "-DäTÈ--OF___DîSCHARGE-", "GEndèr/", " Location. ") variables clean_labels(variables)
hash_names
can be used to generate hashed labels from linelist data. Based on
pre-defined fields, it will generate anonymous labels. This system has the
following desirable features:
given the same input, the output will always be the same, so this encoding system generates labels which can be used by different people and organisations
given different inputs, the output will always be different; even minor differences in input will result in entirely different outputs
given an output, it is very hard to infer the input (it requires hacking skills); if security is challenged, the hashing algorithm can be 'salted' to strengthen security
first_name <- c("Jane", "Joe", "Raoul", "Raoul") last_name <- c("Doe", "Smith", "Dupont", "Dupond") age <- c(25, 69, 36, 36) ## detailed output by default hash_names(first_name, last_name, age) ## short labels for practical use hash_names(first_name, last_name, age, size = 8, full = FALSE)
The function empirical_incubation_dist()
computes the discrete probability
distribution by giving equal weight to each patient. Thus, in the case of N
patients, the n
possible exposure dates of a given patient get the overall
weight 1/(n*N)
. The function returns a data frame with column
incubation_period
containing the different incubation periods with a time step
of one day and their relative_frequency
.
Load environment:
library(magrittr) library(tibble) library(epitrix) library(distcrete) library(ggplot2)
Make a linelist object containing toy data with several possible exposure dates for each case:
ll <- sim_linelist(30) %>% tibble() x <- 0:15 y <- distcrete("gamma", 1, shape = 12, rate = 3, w = 0)$d(x) mkexposures <- function(i) i - sample(x, size = sample.int(5, size = 1), replace = FALSE, prob = y) exposures <- sapply(ll$date_of_onset, mkexposures) ll$dates_exposure <- exposures print(ll)
Empirical distribution:
incubation_period_dist <- empirical_incubation_dist(ll, date_of_onset, dates_exposure) print(incubation_period_dist) ggplot(incubation_period_dist, aes(incubation_period, relative_frequency)) + geom_col()
Fit discrete gamma:
fit <- fit_gamma_incubation_dist(ll, date_of_onset, dates_exposure) print(fit) x = c(0:10) y = fit$distribution$d(x) ggplot(data.frame(x = x, y = y), aes(x, y)) + geom_col(data = incubation_period_dist, aes(incubation_period, relative_frequency)) + geom_point(stat="identity", col = "red", size = 3) + geom_line(stat="identity", col = "red")
Note that if the possible exposure dates are consecutive for all patients then empirical_incubation_dist()
and fit_gamma_incubation_dist()
can take date ranges as inputs instead of lists of individual exposure dates (see help for details).
The overview vignette
essentially replicates the content of this README
. To request or contribute
other vignettes, see the section "getting help, contributing".
The estimate incubation vignette contains worked examples for the emperical_incubation_dist()
fit_gamma_incubation_dist()
.
Click here for the website dedicated to epitrix.
Bug reports and feature requests should be posted on github using the issue system. All other questions should be posted on the RECON forum.
Contributions are welcome via pull requests.
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.