Nothing
#' Estimate a static disease severity measure
#'
#' @description Calculates the severity of a disease, while optionally
#' correcting for reporting delays using an epidemiological delay distribution
#' of the time between symptom onset and death (onset-to-death).
#'
#' Other delay distributions may be passed to calculate different disease
#' severity measures such as the hospitalisation fatality risk.
#'
#' @param data A `<data.frame>` containing the outbreak data. A daily time
#' series with dates or some other absolute indicator of time (e.g. epiday or
#' epiweek) and the numbers of new cases and new deaths at each time point.
#' Note that the required columns are "date" (for the date), "cases" (for the
#' number of reported cases), and "deaths" (for the number of reported deaths)
#' on each day of the outbreak.
#'
#' Note that the `<data.frame>` is required to have an unbroken sequence of
#' dates with no missing dates in between. The "date" column must be of class
#' `Date` (see [as.Date()]).
#'
#' Note also that the total number of cases must be greater than the total
#' number of reported deaths.
#'
#' @param delay_density An optional argument that controls whether delay
#' correction is applied in the severity estimation.
#' May be `NULL`, for no delay correction, or a function that returns the
#' density function of a distribution to evaluate
#' density at user-specified values, e.g.
#' `function(x) stats::dgamma(x = x, shape = 5, scale = 1)`.
#'
#' @param poisson_threshold The case count above which to use Poisson
#' approximation. Set to 100 by default. Must be > 0.
#'
#' @details
#' # Details: Adjusting for delays between two time series
#'
#' The method used in `cfr_static()` follows Nishiura et al.
#' (2009).
#' The function calculates a quantity \eqn{u_t} for each day within the input
#' data, which represents the proportion of cases estimated to have
#' a known outcome on day \eqn{t}.
#' Following Nishiura et al., \eqn{u_t} is calculated as:
#' \deqn{u_t = \dfrac{\sum_{i = 0}^t
#' \sum_{j = 0}^\infty c_i f_{j - i}}{\sum_{i = 0} c_i}}
#' where \eqn{f_t} is the value of the probability mass function at time \eqn{t}
#' and \eqn{c_t}, \eqn{d_t} are the number of new cases and new deaths at time
#' \eqn{t}, (respectively).
#' We then use \eqn{u_t} at the end of the outbreak in the following likelihood
#' function to estimate the severity of the disease in question.
#' \deqn{{\sf L}({\theta \mid y}) = \log{\dbinom{u_tC}{D}} + D \log{\theta} +
#' (u_tC - D)\log{(1.0 - \theta)}}
#' \eqn{C} and \eqn{D} are the cumulative number of cases and deaths
#' (respectively) up until time \eqn{t}.
#' \eqn{\theta} is the parameter we wish to estimate, the severity of the
#' disease. We estimate \eqn{\theta} using simple maximum-likelihood methods,
#' allowing the functions within this package to be quick and easy tools to use.
#'
#' The precise severity measure — CFR, IFR, HFR, etc — that \eqn{\theta}
#' represents depends upon the input data given by the user.
#'
#' The epidemiological delay-distribution density function passed to
#' `delay_density` is used to evaluate the probability mass function
#' parameterised by time; i.e. \eqn{f(t)} which
#' gives the probability that a case has a known outcome (usually death) at time
#' \eqn{t}, parameterised with disease-specific parameters before it is supplied
#' here.
#'
#' ## Profile likelihood methods
#'
#' The naive CFR estimate (without delay correction) is the outcome of a
#' Binomial test on deaths and cases using [stats::binom.test()].
#' The confidence intervals around the estimate are also taken from the test.
#'
#' The delay-corrected CFR estimates are however obtained by generating a
#' profile likelihood over the sequence `seq(1e-4, 1.0, 1e-4)`. The method used
#' depends on the outbreak size and the initial expectation of disease severity.
#' This is implemented in the internal function `.estimate_severity()`.
#'
#' - **Delay correction, small outbreaks**: For outbreaks where the total cases
#' are below the user-specified 'Poisson threshold' (`poisson_threshold`,
#' default = 100), the CFR and uncertainty around it is taken from a profile
#' likelihood generated from a Binomial model of deaths (successes) and
#' estimated known outcomes (trials).
#'
#' - **Delay correction, large outbreaks with low severity**: For outbreaks
#' with total cases greater than the Poisson threshold (default = 100) and with
#' initial severity estimates < 0.05, the CFR and uncertainty are taken from a
#' Poisson approximation of the Binomial profile likelihood (taking
#' \eqn{\lambda = np} for \eqn{n} estimated outcomes and \eqn{p} as the severity
#' estimate).
#'
#' - **Delay correction, large outbreaks with higher severity**: For outbreaks
#' with total cases greater than the Poisson threshold (default = 100) and with
#' initial severity estimates \eqn{\geq} 0.05, the CFR and uncertainty are taken
#' from a Normal approximation of the Binomial profile likelihood.
#'
#' @references
#' Nishiura, H., Klinkenberg, D., Roberts, M., & Heesterbeek, J. A. P. (2009).
#' Early Epidemiological Assessment of the Virulence of Emerging Infectious
#' Diseases: A Case Study of an Influenza Pandemic. PLOS ONE, 4(8), e6852.
#' \doi{10.1371/journal.pone.0006852}
#'
#' @return A `<data.frame>` with the maximum likelihood estimate and 95%
#' confidence interval of the severity estimates, named "severity_estimate",
#' "severity_low", and "severity_high".
#'
#' @export
#'
#' @examples
#' # load package data
#' data("ebola1976")
#'
#' # estimate severity without correcting for delays
#' cfr_static(ebola1976)
#'
#' # estimate severity for each day while correcting for delays
#' # obtain onset-to-death delay distribution parameters from Barry et al. 2018
#' # The Lancet. <https://doi.org/10.1016/S0140-6736(18)31387-4>
#' cfr_static(
#' ebola1976,
#' delay_density = function(x) dgamma(x, shape = 2.40, scale = 3.33)
#' )
#'
cfr_static <- function(data,
delay_density = NULL,
poisson_threshold = 100) {
# input checking
checkmate::assert_data_frame(
data,
min.rows = 1, min.cols = 3
)
# check that input `<data.frame>` has columns date, cases, and deaths
checkmate::assert_names(
colnames(data),
must.include = c("date", "cases", "deaths")
)
# check for any NAs among data
checkmate::assert_data_frame(
data[, c("date", "cases", "deaths")],
types = c("Date", "integerish"),
any.missing = FALSE
)
# check that data$date is a date column
checkmate::assert_date(data$date, any.missing = FALSE, all.missing = FALSE)
# check for excessive missing date and throw an error
stopifnot(
"Input data must have sequential dates with none missing or duplicated" =
identical(unique(as.numeric(diff(data$date))), 1) # use numeric 1
# this solution works when df$date is `Date`
# this may need more thought for dates that are integers, POSIXct,
# or other units; consider the units package
)
checkmate::assert_count(poisson_threshold, positive = TRUE)
# NOTE: delay_density is checked in estimate_outcomes() if passed and not NULL
# calculating the total number of cases (without correcting) and deaths
# Calculate total cases and deaths to pass to secondary checking
total_cases <- sum(data$cases, na.rm = TRUE)
total_deaths <- sum(data$deaths, na.rm = TRUE)
# Add input checking for total cases and deaths
checkmate::assert_count(total_cases)
# use assert_number to set upper limit at total_cases
checkmate::assert_number(total_deaths, upper = total_cases, lower = 0)
# apply delay correction if a delay distribution is provided
if (!is.null(delay_density)) {
# calculating the corrected severity, corrected for delay between case
# detection and outcome estimating the number of cases with a known outcome,
# used as a replacement for total deaths in the original severity formula
df_corrected <- estimate_outcomes(
data = data,
delay_density = delay_density
)
# calculate total cases, deaths, and outcomes
total_outcomes <- sum(df_corrected$estimated_outcomes, na.rm = TRUE)
# NOTE: previous code used `u_t = total_outcomes / total_cases`
# which can be simplified in all operations to simply `total_outcomes`
# Get direct estimate of p, and throw warning if p < 1e-4
p_mid <- total_deaths / round(total_outcomes)
if (p_mid < 1e-4) {
warning(
"Ratio of total deaths to total cases with known outcome",
" is below 0.01%: CFR estimates may be unreliable.",
call. = FALSE
)
}
# calculating the maximum likelihood estimate and 95% confidence interval
# using the binomial likelihood function from Nishiura
severity_estimate <- .estimate_severity(
total_cases = total_cases,
total_deaths = total_deaths,
total_outcomes = total_outcomes,
poisson_threshold = poisson_threshold,
p_mid = p_mid
)
# .estimate_severity() returns vector; convert to list and then data.frame
severity_estimate <- as.data.frame(as.list(severity_estimate))
} else {
# calculating the central estimate
severity_estimate <- total_deaths / total_cases
# calculating the lower and upper 95% confidence interval using the exact
# binomial test
severity_conf <- stats::binom.test(round(total_deaths), total_cases, p = 1)
# extracting the lower and upper intervals respectively
severity_lims <- severity_conf$conf.int
severity_estimate <- data.frame(
severity_estimate = severity_estimate,
severity_low = severity_lims[[1]],
severity_high = severity_lims[[2]]
)
}
# return the severity estimate
severity_estimate
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.