knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

Welcome to the epitrix package!

R-CMD-check codecov.io CRAN_Status_Badge CRAN Downloads Downloads from Rstudio mirror Codecov test coverage

This package implements small helper functions usefull in infectious disease modelling and epidemics analysis.

Installing the package

To install the current stable, CRAN version of the package, type:

install.packages("epitrix")

To benefit from the latest features and bug fixes, install the development, github version of the package using:

devtools::install_github("reconhub/epitrix")

Note that this requires the package devtools installed.

What does it do?

The main features of the package include:

Resources

Worked examples

Fitting a gamma distribution to delay data

In this example, we simulate data which replicate the serial interval (SI), i.e. the delays between primary and secondary symptom onsets, in Ebola Virus Disease (EVD). We start by converting previously estimates of the mean and standard deviation of the SI (WHO Ebola Response Team (2014) NEJM 371:1481–1495) to the parameters of a Gamma distribution:

library(epitrix)

mu <- 15.3 # mean in days days
sigma <- 9.3 # standard deviation in days
cv <- sigma / mu # coefficient of variation
cv
param <- gamma_mucv2shapescale(mu, cv) # convertion to Gamma parameters
param

The shape and scale are parameters of a Gamma distribution we can use to generate delays. However, delays are typically reported per days, which implies a discretisation (from continuous time to discrete numbers). We use the package distcrete to achieve this discretisation. It generates a list of functions, including one to simulate data ($r), which we use to simulate 500 delays:

si <- distcrete::distcrete("gamma", interval = 1,
               shape = param$shape,
               scale = param$scale, w = 0)
si
set.seed(1)
x <- si$r(500)
head(x, 10)
hist(x, col = "grey", border = "white",
     xlab = "Days between primary and secondary onset",
     main = "Simulated serial intervals")

x contains simulated data, for illustrative purpose. In practice, one would use real data from an ongoing outbreaks. Now we use fit_disc_gamma to estimate the parameters of a dicretised Gamma distribution from the data:

si_fit <- fit_disc_gamma(x)
si_fit

Converting a growth rate (r) to a reproduction number (R0)

The package incidence can fit a log-linear model to incidence curves (function fit), which produces a growth rate (r). This growth rate can in turn be translated into a basic reproduction number (R0) using r2R0. We illustrate this using simulated Ebola data from the outbreaks package, and using the serial interval from the previous example:

library(outbreaks)
library(incidence)

i <- incidence(ebola_sim$linelist$date_of_onset)
i
f <- fit(i[1:150]) # fit on first 150 days
plot(i[1:200], fit = f, color = "#9fc2fc")

r2R0(f$info$r, si$d(1:100))
r2R0(f$info$r.conf, si$d(1:100))

In addition, we can also use the function lm2R0_sample to generate samples of R0 values compatible with a model fit:

R0_val <- lm2R0_sample(f$model, si$d(1:100), n = 100)
head(R0_val)
hist(R0_val, col = "grey", border = "white")

Standardising labels

If you want to use labels that will work across different computers, independent of local encoding and operating systems, clean_labels will make your life easier. The function transforms character strings by replacing diacritic symbols with their closest alphanumeric matches, setting all characters to lower case, and replacing various separators with a single, consistent one.

For instance:

x <- " Thîs- is A   wêïrD LäBeL .."
x
clean_labels(x)

variables <- c("Date.of.ONSET ",
               "/  date of hôspitalisation  /",
               "-DäTÈ--OF___DîSCHARGE-",
               "GEndèr/",
               "  Location. ")
variables
clean_labels(variables)

Anonymising data

hash_names can be used to generate hashed labels from linelist data. Based on pre-defined fields, it will generate anonymous labels. This system has the following desirable features:

first_name <- c("Jane", "Joe", "Raoul", "Raoul")
last_name <- c("Doe", "Smith", "Dupont", "Dupond")
age <- c(25, 69, 36, 36)

## detailed output by default
hash_names(first_name, last_name, age)

## short labels for practical use
hash_names(first_name, last_name, age,
           size = 8, full = FALSE)

Estimate incubation periods

The function empirical_incubation_dist() computes the discrete probability distribution by giving equal weight to each patient. Thus, in the case of N patients, the n possible exposure dates of a given patient get the overall weight 1/(n*N). The function returns a data frame with column incubation_period containing the different incubation periods with a time step of one day and their relative_frequency.

Load environment:

library(magrittr)
library(tibble)
library(epitrix)
library(distcrete)
library(ggplot2)

Make a linelist object containing toy data with several possible exposure dates for each case:

ll <- sim_linelist(30) %>%
  tibble()

x <- 0:15
y <- distcrete("gamma", 1, shape = 12, rate = 3, w = 0)$d(x)
mkexposures <- function(i) i - sample(x, size = sample.int(5, size = 1), replace = FALSE, prob = y)
exposures <- sapply(ll$date_of_onset, mkexposures)
ll$dates_exposure <- exposures

print(ll)

Empirical distribution:

incubation_period_dist <- empirical_incubation_dist(ll, date_of_onset, dates_exposure)
print(incubation_period_dist)

ggplot(incubation_period_dist, aes(incubation_period, relative_frequency)) +
  geom_col()

Fit discrete gamma:

fit <- fit_gamma_incubation_dist(ll, date_of_onset, dates_exposure)
print(fit)

x = c(0:10)
y = fit$distribution$d(x)
ggplot(data.frame(x = x, y = y), aes(x, y)) +
  geom_col(data = incubation_period_dist, aes(incubation_period, relative_frequency)) +
  geom_point(stat="identity", col = "red", size = 3) +
  geom_line(stat="identity", col = "red")

Note that if the possible exposure dates are consecutive for all patients then empirical_incubation_dist() and fit_gamma_incubation_dist() can take date ranges as inputs instead of lists of individual exposure dates (see help for details).

Vignettes

The overview vignette essentially replicates the content of this README. To request or contribute other vignettes, see the section "getting help, contributing".

The estimate incubation vignette contains worked examples for the emperical_incubation_dist() fit_gamma_incubation_dist().

Websites

Click here for the website dedicated to epitrix.

Getting help, contributing

Bug reports and feature requests should be posted on github using the issue system. All other questions should be posted on the RECON forum.

Contributions are welcome via pull requests.

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.



reconhub/epitrix documentation built on Feb. 5, 2023, 7:39 a.m.