R/proportion_transmission.R

Defines functions .prop_transmission_t20 .prop_transmission_numerical .prop_transmission_analytical proportion_transmission

Documented in proportion_transmission

#' Estimate what proportion of cases cause a certain proportion of
#' transmission
#'
#' @description Calculates the proportion of cases that cause a certain
#' percentage of transmission.
#'
#' It is commonly estimated what proportion of cases cause 80% of transmission
#' (i.e. secondary cases).
#' This can be calculated using `proportion_transmission()` at varying values of
#' \eqn{R} and for different values of percentage transmission.
#'
#' There are two methods for calculating the proportion of transmission,
#' \eqn{p_{80}} (default) and \eqn{t_{20}}, see `method` argument or details for
#' more information.
#'
#' @details
#' Calculates the expected proportion of transmission from a given
#' proportion of infectious cases. There are two methods to calculate this with
#' distinct formulations, \eqn{p_{80}} and \eqn{t_{20}} these can be specified
#' by the `method` argument.
#'
#' `method = p_80` calculates relative transmission heterogeneity
#' from the offspring distribution of secondary cases, \eqn{Z}, where the upper
#' proportion of the distribution comprise \eqn{x%} of total number of cases
#' given R0 and k, where \eqn{x} is typically defined as 0.8 or 80%. e.g. 80%
#' of all transmissions are generated by the upper 20% of cases, or
#' `p_80 = 0.2`, per the 80/20 rule. In this formulation, changes in R can
#' have a significant effect on the estimate of \eqn{p_80} even when k is
#' constant. Importantly, this formulation **does not** allow for true
#' homogeneity when `k = Inf` i.e. \eqn{p_{80} = 0.8}.
#'
#' `method = t_20` calculates a similar ratio, instead in terms of
#' the theoretical individual reproductive number and infectiousness given
#' R0 and k. The individual reproductive number, 'v', is described in
#' Lloyd-Smith JO et al. (2005), "as a random variable representing the
#' expected number of secondary cases caused by a particular infected
#' individual. Values for v are drawn from a continuous gamma probability
#' distribution with population mean R0 and dispersion parameter k, which
#' encodes all variation in infectious histories of individuals, including
#' properties of the host and pathogen and environmental circumstances." The
#' value of k corresponds to the shape parameters of the gamma distribution
#' which encodes the variation in the gamma-poisson mixture aka the negative
#' binomial
#'
#' For `method = t_20`, we define the upper proportion of infectiousness,
#' which is typically 0.2 i.e. the upper 20% most infectious
#' cases, again per the 80/20 rule. e.g. the most infectious 20% of cases,
#' are expected to produce 80% of all infections, or `t_20 = 0.8`. Unlike
#' `method = p_80`, changes in R have no effect on the estimate
#' of t_80 when k is constant, but R is still required for the underlying
#' calculation. This formulation **does** allow for true homogeneity when
#' k = Inf i.e. t_20 = 0.2, or t_80 = 0.8.
#'
#' Multiple values of R and k can be supplied and a `<data.frame>` of
#' every combination of these will be returned.
#'
#' The numerical calculation for `method = p_80` uses random number generation
#' to simulate secondary contacts so the answers may minimally vary between
#' calls. The number of simulation replicates is fixed to `r NSIM`.
#'
#' @inheritParams proportion_cluster_size
#' @inheritParams probability_epidemic
#' @param percent_transmission A `number` of the percentage transmission
#' for which a proportion of cases has produced.
#' @param method A `character` string defining which method is used to calculate
#' the proportion of transmission. Options are `"p_80"` (default) or `"t_20"`.
#' See details for more information on each of these methods.
#' @param simulate A `logical` whether the calculation should be done
#' numerically (i.e. simulate secondary contacts) or analytically. Default is
#' `FALSE` which uses the analytical calculation.
#'
#' @return A `<data.frame>` with the value for the proportion of cases for a
#' given value of R and k.
#' @export
#'
#' @references
#'
#' The analytical calculation is from:
#'
#' Endo, A., Abbott, S., Kucharski, A. J., & Funk, S. (2020)
#' Estimating the overdispersion in COVID-19 transmission using outbreak
#' sizes outside China. Wellcome Open Research, 5.
#' \doi{10.12688/wellcomeopenres.15842.3}
#'
#' The \eqn{t_{20}} method follows the formula defined in section 2.2.5 of the
#' supplementary material for:
#'
#' Lloyd-Smith JO, Schreiber SJ, Kopp PE, Getz WM. Superspreading and the
#' effect of individual variation on disease emergence. Nature. 2005
#' Nov;438(7066):355–9. \doi{10.1038/nature04153}
#'
#' The original code for the \eqn{t_{20}} method is from ongoing work
#' originating from \url{https://github.com/dcadam/kt} and:
#'
#' Adam D, Gostic K, Tsang T, Wu P, Lim WW, Yeung A, et al.
#' Time-varying transmission heterogeneity of SARS and COVID-19 in Hong Kong.
#' 2022. \doi{10.21203/rs.3.rs-1407962/v1}
#'
#' @examples
#' # example of single values of R and k
#' percent_transmission <- 0.8 # 80% of transmission
#' R <- 2
#' k <- 0.5
#' proportion_transmission(
#'   R = R,
#'   k = k,
#'   percent_transmission = percent_transmission
#' )
#'
#' # example with multiple values of k
#' k <- c(0.1, 0.2, 0.3, 0.4, 0.5, 1)
#' proportion_transmission(
#'   R = R,
#'   k = k,
#'   percent_transmission = percent_transmission
#' )
#'
#' # example with vectors of R and k
#' R <- c(1, 2, 3)
#' proportion_transmission(
#'   R = R,
#'   k = k,
#'   percent_transmission = percent_transmission
#' )
proportion_transmission <- function(R, k,
                                    percent_transmission,
                                    method = c("p_80", "t_20"),
                                    simulate = FALSE,
                                    ...,
                                    offspring_dist,
                                    format_prop = TRUE) {
  missing_params <- missing(R) && missing(k)
  .check_input_params(
    missing_params = missing_params,
    missing_offspring_dist = missing(offspring_dist)
  )
  method <- match.arg(method)

  # check inputs
  chkDots(...)
  if (missing_params) {
    checkmate::assert_class(offspring_dist, classes = "epiparameter")
    R <- get_epiparameter_param(epiparameter = offspring_dist, parameter = "R")
    k <- get_epiparameter_param(epiparameter = offspring_dist, parameter = "k")
  }
  checkmate::assert_numeric(R, lower = 0, finite = TRUE)
  checkmate::assert_numeric(k, lower = 0)
  # checkmate lower is >= 0 but k must be > 0
  stopifnot("k must be greater than zero." = k != 0)
  # k must be finite for analytical p_80 and t_20 methods
  if (any(is.infinite(k))) {
    k[is.infinite(k)] <- FINITE_INF
    message(
      "Infinite values of k are being approximated by ",
      FINITE_INF, " for calculations."
    )
  }
  checkmate::assert_number(percent_transmission, lower = 0, upper = 1)
  stopifnot(
    "percent_transmission must be less than 1." = percent_transmission != 1,
    "percent_transmission must be greater than 0." = percent_transmission != 0
  )
  checkmate::assert_logical(simulate, any.missing = FALSE, len = 1)
  checkmate::assert_logical(format_prop, any.missing = FALSE, len = 1)

  if (simulate && method == "t_20") {
    stop(
      "The simulate argument must be FALSE when method = t_20.",
      call. = FALSE
    )
  }

  df <- expand.grid(R = R, k = k, NA_real_)
  colnames(df) <- c("R", "k", paste0("prop_", percent_transmission * 100))

  for (i in seq_len(nrow(df))) {
    if (simulate) {
      prop <- .prop_transmission_numerical(
        R = df[i, "R"],
        k = df[i, "k"],
        percent_transmission = percent_transmission
      )
    } else {
      if (method == "p_80") {
        prop <- .prop_transmission_analytical(
          R = df[i, "R"],
          k = df[i, "k"],
          percent_transmission = percent_transmission
        )
      } else {
        prop <- .prop_transmission_t20(
          R = df[i, "R"],
          k = df[i, "k"],
          percent_transmission = percent_transmission
        )
      }
    }

    if (format_prop) {
      prop <- paste0(signif(prop * 100, digits = 3), "%")
    }
    # df is ways i x 3 so insert value into col 3
    df[i, 3] <- prop
  }
  return(df)
}

#' Analytical calculation of proportion of cases that cause \eqn{x} percent
#' of transmission
#'
#' @return A numeric
#' @keywords internal
#' @noRd
.prop_transmission_analytical <- function(R, k, percent_transmission) {

  xm1 <- stats::qnbinom(1 - percent_transmission, k + 1, mu = R * (k + 1) / k)
  remq <- 1 - percent_transmission -
    stats::pnbinom(xm1 - 1, k + 1, mu = R * (k + 1) / k)
  remx <- remq / stats::dnbinom(xm1, k + 1, mu = R * (k + 1) / k)
  x <- xm1 + 1
  out <- 1 - stats::pnbinom(x - 1, k, mu = R) -
    stats::dnbinom(x, k, mu = R) * remx
  return(out)
}

#' Numerical calculation of proportion of cases that cause \eqn{x} percent
#' of transmission
#'
#' @return A numeric
#' @keywords internal
#' @noRd
.prop_transmission_numerical <- function(R, k, percent_transmission) {

  simulate_secondary <- stats::rnbinom(
    n = NSIM,
    mu = R,
    size = k
  )

  # percentage of cases
  percent_cases <- percent_transmission * sum(simulate_secondary)

  # cumulative sum of simulated secondary cases
  cumsum_secondary <- cumsum(sort(simulate_secondary, decreasing = TRUE))

  # proportion causing case
  out <- sum(cumsum_secondary <= percent_cases) / NSIM

  return(out)
}

#' Calculates the expected proportion of transmission from the upper `prop`%
#' of most infectious cases
#'
#' @return A `numeric`
#' @keywords internal
#' @noRd
.prop_transmission_t20 <- function(R, k, percent_transmission) {
  k <- k %gt% 1e7
  percent_transmission <- percent_transmission %gt% 0.99
  u <- solve_for_u(prop = percent_transmission, R = R, k = k)
  integral_result <- stats::integrate(
    function(u) u * fvx(u, R, k), lower = 0, upper = u
  )
  res <- integral_result$value / R
  return(1 - res)
}

Try the superspreading package in your browser

Any scripts or data that you put into this service are public.

superspreading documentation built on April 4, 2025, 3:18 a.m.