knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE ) knitr::knit_hooks$set( source = function(x, options) { hook.r <- function(x, options) { fence <- "```" language <- tolower(options$engine) if (language == 'node') language <- 'javascript' if (!options$highlight) language <- 'text' if(!is.null(options$foldcode)) { paste0('\n\n', "<details><summary>Source</summary>\n", fence, language, '\n', x, fence, '\n\n', "</details>\n") } else { paste0('\n\n', fence, language, '\n', x, fence, '\n\n') } } x <- knitr:::hilight_source(x, 'markdown', options) hook.r( paste(c( x, '' ), collapse = '\n'), options ) } ) Sys.setlocale("LC_TIME", "C")
library(smidm) library(ggplot2) library(dplyr) library(hdrcde)
One person has become infected with COVID-19 on 28.12.2021. When will the contact person become ill and show the first symptoms?
get_serial_interval_density
The function get_serial_interval_density()
creates a dataframe containing the probability that a contact will start showing symptoms (serial interval) at a particular date/time.
Therefore, only the symptom begin date of the infected person is needed. Furthermore, the probability when the infected contacts of infected contacts show symptoms can be calculated, known as the second and also third generation of contacts.
Multiple arguments are needed for the function get_serial_interval_density()
:
First of all, the symptom_begin_date
has to be specified, which is defined as the date when the person started to show symptoms.
Furthermore, the max_serial_interval_days
is needed, which defines the interval length of the distribution output.
The remaining two inputs shape_serial
and rate_serial
are the parameters of the log-normal distribution, which models the serial interval.
symptom_begin_date <- as.Date("2021-12-28") max_serial_interval_days <- 20 shape_serial <- 2.154631545 rate_serial <- 0.377343528 serial_in_df_v1 <- get_serial_interval_density(symptom_begin_date, max_serial_interval_days, shape_serial, rate_serial)
The default values of the parameters for the distribution, when the infected contacts will show symptoms, are from the paper Najafi et al. [1], in which a gamma distribution for the serial interval between symptom onsets of infected persons and the symptom onset of infected contacts was estimated.
For the second generation of contacts the probability equals the summation of random variables because we assume that the infection from infected person to the first contact generation is independent from the first to the second generation. Thus, a convolution of two identical gamma distributions has to be conducted. For gamma distributions the convolutions of gamma distributions equals the summation of their first parameters [2]. Thus, the density for the second generation of infected persons showing symptoms is given by a gamma distribution with $2 \cdot$ shape_serial
and the same rate_serial
. In general, the serial interval for the $i$-th generation of contacts can be calculated with a gamma distribution with parameters $i \cdot$ shape_serial
and rate_serial
. This holds in an analogous way for the third generation.
The function call returns the following data set:
knitr::kable(serial_in_df_v1[100:109, ], caption = "values 100 to 109 of resulting data frame")
The data frame shows for each hour beginning at symptom_begin_date
until max_serial_interval_days
the resulting density of the gamma distribution. This density can be used for calculating the most probable period for a contact person start showing symptoms. The same data frame is obtained for the second generation. Here, the time interval of the distribution is larger.
symptom_begin_date <- as.Date("2021-12-28") max_serial_interval_days <- 20 shape_serial <- 2 * 2.154631545 rate_serial <- 0.377343528 serial_in_df_v2 <- get_serial_interval_density(symptom_begin_date, max_serial_interval_days, shape_serial, rate_serial)
knitr::kable(serial_in_df_v2[100:109, ], caption = "values 100 to 109 of resulting data frame")
get_serial_interval_density
The following code generates a plot with the gamma distribution of the illness probability of the first contact generation and the 80% and 95% high density intervals. In addition, the illness probability for the second generation is plotted in violet.
.calculate_qstart_qend <- function(probability, df) { hdr_df <- hdr(den = data.frame(x = 1:length(df$distribution), y = df$distribution), p = probability * 100)$hdr qstart <- (hdr_df[1, 1] - 1) / 24 qend <- (hdr_df[1, 2] - 1) / 24 return(list("qstart" = qstart, "qend" = qend)) }
.shade_curve <- function(df, qstart, qend, fill = "red", alpha = 0.4) { subset_df <- df[floor(qstart * 24):ceiling(qend * 24), ] geom_area(data = subset_df, aes(x = x, y = y), fill = fill, color = NA, alpha = alpha) }
symptom_begin_date <- as.Date("2021-12-28") df <- get_serial_interval_density(symptom_begin_date, max_serial_interval_days = 20, shape_serial = 2.154631545, rate_serial = 0.377343528) period_80 <- .calculate_qstart_qend(0.8, df) period_95 <- .calculate_qstart_qend(0.95, df) df_2 <- get_serial_interval_density(symptom_begin_date, max_serial_interval_days = 20, shape_serial = 2 * 2.154631545, rate_serial = 0.377343528) symp_date_posixct_start <- as.POSIXct(format(as.POSIXct(symptom_begin_date, tz = "CET"), "%Y-%m-%d")) symp_date_posixct_end <- as.POSIXct(format(as.POSIXct(symptom_begin_date + 1, tz = "CET"), "%Y-%m-%d")) symp_date_posixct_mid <- symp_date_posixct_start - as.numeric(difftime(symp_date_posixct_start, symp_date_posixct_end, units = "hours")) / 2 * 3600
g <- ggplot() + scale_x_datetime(breaks = scales::date_breaks("1 days"), labels = scales::date_format("%d %b")) + theme(axis.text.x = element_text(angle = 90)) + # scale_x_continuous(breaks = x_tick, # labels = x_label) + # theme(axis.ticks.x = element_line(color = c(rbind(rep("black", length(x_label) / 2), rep(NA, length(x_label) / 2))), linetype = 2, size = 1)) + geom_path(aes(x = df$dates, y = df$distribution), color = "red", size = 1) + geom_path(aes(x = df_2$dates, y = df_2$distribution), color = "purple", size = 1) + .shade_curve(df = data.frame(x = df$dates, y = df$distribution), period_80$qstart, period_80$qend) + .shade_curve(df = data.frame(x = df$dates, y = df$distribution), period_95$qstart, period_95$qend, alpha = 0.2) + geom_rect(data = data.frame(xmin = symp_date_posixct_start, xmax = symp_date_posixct_end, ymin = -Inf, ymax = Inf), aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax), fill = "brown", alpha = 0.3) + geom_label(aes(x = symp_date_posixct_mid, y = 0.9*max(df$distribution), label = "symptom\nonset"), colour = "brown", fill = "white", size = 5, label.size = NA) + ylab("probability") + xlab("timeline") + labs(color = 'Verteilung') + # ggtitle("Visualization of get_infection_date_density ") + theme(legend.position = "none", text = element_text(size = 16*5/5)) + theme(axis.text.x = element_text(colour = "black", face = "bold", angle = 30, hjust = 1)) + theme(axis.title.x = element_text(colour = "black", face = "bold")) + theme(axis.text.y = element_text(colour = "gray50")) + theme(axis.title.y = element_text(colour = "gray50")) g
[1] Najafi F, Izadi N, Hashemi-Nazari S-S, Khosravi-Shadmani F, Nikbakht R, Shakiba E. Serial interval and time-varying reproduction number estimation for COVID-19 in western Iran. New Microbes and New Infections. 2020; 36: 100715.
[2] https://en.wikipedia.org/wiki/Gamma_distribution
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.