R/simulate.R

Defines functions simulate_chain_stats simulate_chains

Documented in simulate_chains simulate_chain_stats

#' Simulate transmission chains
#'
#' @description
#' It generates independent transmission chains starting with a single case
#' per chain, using a simple branching process model (See details for
#' definition of "chains" and assumptions). Offspring for each
#' chain are generated with an offspring distribution, and an optional
#' generation time distribution function.
#'
#' The individual chain simulations are controlled by customisable stopping
#' criteria, including a threshold chain size or length, and a generation time
#' cut off. The function also optionally accepts population related inputs
#' such as the population size (defaults to Inf) and percentage of the
#' population initially immune (defaults to 0).
#' @param n_chains Number of chains to simulate.
#' @param offspring_dist Offspring distribution: a `<function>` like the ones
#' provided by R to generate random numbers from given distributions (e.g.,
#' \code{\link{rpois}} for Poisson). More specifically, the function needs to
#' accept at least one argument, \code{n}, which is the number of random
#' numbers to generate. It can accept further arguments, which will be passed
#' on to the random number generating functions. Examples that can be provided
#' here are `rpois` for Poisson distributed offspring, `rnbinom` for negative
#' binomial offspring, or custom functions.
#' @param statistic The chain statistic to track as the
#' stopping criteria for each chain being simulated when `stat_threshold` is not
#' `Inf`; A `<string>`. It can be one of:
#' \itemize{
#'   \item "size": the total number of cases produced by a chain before it goes
#'   extinct.
#'   \item "length": the total number of generations reached by a chain before
#'   it goes extinct.
#' }
#' @param stat_threshold A stopping criterion for individual chain simulations;
#' a positive number coercible to integer. When any chain's cumulative statistic
#' reaches or surpasses `stat_threshold`, that chain ends. Defaults to `Inf`.
#' For example, if `statistic = "size"` and `stat_threshold = 10`, then any
#' chain that produces 10 or more cases will stop. Note that setting
#' `stat_threshold` does not guarantee that all chains will stop at the same
#' value.
#' @param pop Population size; An `<Integer>`. Used alongside `percent_immune`
#' to define the susceptible population. Defaults to `Inf`.
#' @param percent_immune Percent of the population immune to
#' infection at the start of the simulation; A `<numeric>` between 0 and 1.
#' Used alongside `pop` to initialise the susceptible population. Defaults to
#' 0.
#' @param generation_time The generation time function; the name
#' of a user-defined named or anonymous function with only one argument `n`,
#' representing the number of generation times to sample.
#' @param t0 Start time (if generation time is given); either a single value
#' or a vector of same length as `n_chains` (number of initial cases) with
#' corresponding initial times. Defaults to 0, meaning all cases started at
#' time 0.
#' @param tf A number for the cut-off for the infection times (if generation
#' time is given); Defaults to `Inf`.
#' @param ... Parameters of the offspring distribution as required by R.
#' @return An `<epichains>` object, which is basically a `<data.frame>`
#' with columns:
#' * `chain` - an ID for active/ongoing chains,
#' * `infectee` - a unique ID for each infectee.
#' * `infector` - an ID for the infector of each infectee.
#' * `generation` - a discrete time point during which infection occurs, and
#' optionally,
#' * `time` - the time of infection.
#' @author James M. Azam, Sebastian Funk
#' @export
#' @details
#' # Definition of a transmission chain
#' A transmission chain as used here is an independent case and all
#' the secondary cases linked to it through transmission. The chain starts with
#' a single case, and each case in the chain generates secondary cases
#' according to the offspring distribution. The chain ends when no more
#' secondary cases are generated.
#'
#' # Calculating chain sizes and lengths
#' The function simulates the chain size for chain \eqn{i} at time
#' \eqn{t}, \eqn{I_{t, i}}, as:
#' \deqn{I_{t, i} = \sum_{i}^{I_{t-1}}X_{t, i},}
#' and the chain length/duration for chain \eqn{i} at time \eqn{t},
#' \eqn{L_{t, i}}, as:
#' \deqn{L_{t, i} = {\sf min}(1, X_{t, i}), }
#' where \eqn{X_{t, i}} is the secondary cases generated by chain \eqn{i}
#' at time \eqn{t}, and \eqn{I_{0, i} = L_{0, i} = 1}.
#'
#' The distribution of secondary cases, \eqn{X_{t, i}} is modelled by the
#' offspring distribution (`offspring_dist`).
#'
#' # Specifying `generation_time`
#'
#' The argument `generation_time` must be specified as a function with
#' one argument, `n`.
#'
#' For example, assuming we want to specify the generation time as a random
#' log-normally distributed variable with `meanlog = 0.58` and `sdlog = 1.58`,
#' we could define a named function, let's call it "generation_time_fn",
#' with only one argument representing the number of generation times to
#' sample: \code{generation_time_fn <- function(n){rlnorm(n, 0.58, 1.38)}},
#' and assign the name of the function to `generation_time` in
#' the simulation function, i.e.
#' \code{`simulate_*`(..., generation_time = generation_time_fn)},
#' where `...` are the other arguments to `simulate_*()` and * is a placeholder
#' for the rest of simulation function's name.
#'
#' Alternatively, we could assign an anonymous function to `generation_time`
#' in the `simulate_*()` call, i.e.
#' \code{simulate_*(..., generation_time = function(n){rlnorm(n, 0.58, 1.38)})}
#' OR \code{simulate_*(..., generation_time = \(n){rlnorm(n, 0.58, 1.38)})},
#' where `...` are the other arguments to `simulate_*()`.
#' @examples
#' # Using a Poisson offspring distribution and simulating from an infinite
#' # population up to chain size 10.
#' set.seed(32)
#' chains_pois_offspring <- simulate_chains(
#'   n_chains = 10,
#'   statistic = "size",
#'   offspring_dist = rpois,
#'   stat_threshold = 10,
#'   generation_time = function(n) rep(3, n),
#'   lambda = 2
#' )
#' chains_pois_offspring
#'
#' # Using a Negative binomial offspring distribution and simulating from a
#' # finite population up to chain size 10.
#' set.seed(32)
#' chains_nbinom_offspring <- simulate_chains(
#'   n_chains = 10,
#'   pop = 100,
#'   percent_immune = 0,
#'   statistic = "size",
#'   offspring_dist = rnbinom,
#'   stat_threshold = 10,
#'   generation_time = function(n) rep(3, n),
#'   mu = 2,
#'   size = 0.2
#' )
#' chains_nbinom_offspring
#' @references
#' Jacob C. (2010). Branching processes: their role in epidemiology.
#' International journal of environmental research and public health, 7(3),
#' 1186–1204. \doi{https://doi.org/10.3390/ijerph7031204}
#'
#' Blumberg, S., and J. O. Lloyd-Smith. 2013. "Comparing Methods for
#' Estimating R0 from the Size Distribution of Subcritical Transmission
#' Chains." Epidemics 5 (3): 131–45.
#' \doi{https://doi.org/10.1016/j.epidem.2013.05.002}.
#'
#' Farrington, C. P., M. N. Kanaan, and N. J. Gay. 2003.
#' "Branching Process Models for Surveillance of Infectious Diseases
#' Controlled by Mass Vaccination.” Biostatistics (Oxford, England)
#' 4 (2): 279–95. \doi{https://doi.org/10.1093/biostatistics/4.2.279}.
simulate_chains <- function(n_chains,
                            statistic = c("size", "length"),
                            offspring_dist,
                            ...,
                            stat_threshold = Inf,
                            pop = Inf,
                            percent_immune = 0,
                            generation_time = NULL,
                            t0 = 0,
                            tf = Inf) {
  # Check offspring and population-related arguments
  .check_sim_args(
    n_chains = n_chains,
    statistic = statistic,
    offspring_dist = offspring_dist,
    stat_threshold = stat_threshold,
    pop = pop,
    percent_immune = percent_immune
  )
  # Check time-related arguments
  # Since tf is passed to .check_time_args, we need to check if it is specified
  # in this function environment. If tf is specified, we expect generation_time
  # to be specified too.
  tf_specified <- !missing(tf)
  .check_time_args(
    generation_time = generation_time,
    t0 = t0,
    tf_specified = tf_specified,
    tf = tf
  )
  # Gather offspring distribution parameters
  pars <- list(...)

  # Initialisations
  stat_track <- rep(1, n_chains) # track statistic (length or size)
  n_offspring <- rep(1, n_chains) # current number of offspring
  chains_active <- seq_len(n_chains) # track ongoing simulations
  new_infectors_ids <- rep(1, n_chains) # track infectors

  # initialise list of data frame to hold the transmission trees
  generation <- 1L
  sim_df <- list(
    data.frame(
      chain = seq_len(n_chains),
      infectee = 1L,
      infector = NA_integer_, # infectors are unknown in the first generation
      generation = generation
    )
  )
  # Initialise susceptible population
  susc_pop <- .init_susc_pop(pop, percent_immune, n_chains)

  # Add optional columns
  if (!missing(generation_time)) {
    sim_df[[generation]]$time <- t0
    times <- sim_df[[generation]]$time
  }
  # Simulate chains until stopping conditions are met
  while (length(chains_active) > 0 && susc_pop > 0) {
    # simulate the next possible offspring
    next_gen <- .sample_possible_offspring(
      offspring_func = offspring_dist,
      offspring_func_pars = pars,
      n_offspring = n_offspring,
      chains = chains_active
    )
    # from all possible offspring, get those that could be infected
    next_gen <- .get_susceptible_offspring(
      new_offspring = next_gen,
      susc_pop = susc_pop,
      pop = pop
    )
    # Adjust the infectibles if they exceed the susceptible population
    if (sum(next_gen) > susc_pop) {
      next_gen <- .adjust_next_gen(
        next_gen = next_gen,
        susc_pop = susc_pop
      )
    }
    # create ids linking active simulations to their offspring
    active_chain_ids <- rep(chains_active, n_offspring[chains_active])

    # initialise place holder for the number of offspring
    n_offspring <- rep(0, n_chains)
    # find the number of new offspring for each active simulation
    n_offspring[chains_active] <- tapply(next_gen, active_chain_ids, sum)
    # update the size/length statistic
    stat_track <- .update_chain_stat(
      stat_type = statistic,
      stat_latest = stat_track,
      n_offspring = n_offspring
    )
    # Adorn the new offspring with their information: their ids, their
    # infector's ids, and the generation they were infected in.
    # Also add the time of infection if generation_time was specified
    if (sum(n_offspring) > 0) {
      infector_ids <- rep(new_infectors_ids, next_gen)
      current_max_id <- unname(tapply(new_infectors_ids, active_chain_ids, max))
      active_chain_ids <- rep(chains_active, n_offspring[chains_active])

      # ids linking each new offspring to their original simulation
      sim_offspring_ids <- rep(current_max_id, n_offspring[chains_active]) +
        sequence(n_offspring[chains_active])

      # increment the generation
      generation <- generation + 1L
      # Update susceptible population
      susc_pop <- susc_pop - sum(n_offspring)

      # store new simulation results
      sim_df[[generation]] <-
        data.frame(
          chain = active_chain_ids,
          infectee = sim_offspring_ids,
          infector = infector_ids,
          generation = generation
        )

      # if a generation time model/function was specified, use it
      # to generate generation times for the cases
      if (!missing(generation_time)) {
        times <- rep(times, next_gen) + generation_time(sum(n_offspring))
        current_min_time <- unname(tapply(times, active_chain_ids, min))
        sim_df[[generation]]$time <- times
      }
    }

    # Find ongoing chains (those still infecting): those that have still
    # offspring and haven't reached the threshold statistic yet
    chains_active <- which(n_offspring > 0 & stat_track < stat_threshold)
    if (length(chains_active) > 0) {
      if (!missing(generation_time)) {
        ## only continue to simulate trees that don't go beyond tf
        unique_index_cases <- unique(active_chain_ids)
        chains_active <- intersect(
          chains_active,
          unique_index_cases[current_min_time < tf]
        )
        times <- times[active_chain_ids %in% chains_active]
      }
      # infectees of active chains become infectors in the next generation
      new_infectors_ids <-
        sim_offspring_ids[active_chain_ids %in% chains_active]
    }
  }

  # Combine the results
  sim_df <- do.call(rbind, sim_df)

  # time column only exists if tf was specified
  if (!missing(tf)) {
    sim_df <- sim_df[sim_df$time < tf, ]
  }

  # Post processing
  rownames(sim_df) <- NULL
  # We want to reorder the columns but that depends on whether "time" is
  # present or not, so we need to determine that first
  column_order <- if (missing(generation_time)) {
    c("chain", "infector", "infectee", "generation")
  } else {
    c("chain", "infector", "infectee", "generation", "time")
  }
  sim_df <- sim_df[, c(column_order)]
  out <- .epichains(
    sim_df,
    n_chains = n_chains,
    statistic = statistic,
    offspring_dist = offspring_dist,
    stat_threshold = stat_threshold,
    track_pop = !missing(pop)
  )
  return(out)
}

#' Simulate a vector of transmission chains statistics (sizes/lengths)
#'
#' @description
#' It generates a vector of transmission chain sizes or lengths using the
#' same model as [simulate_chains()] but without tracking details of the
#' individual chains. This function is useful when only the chain sizes or
#' lengths are of interest.
#'
#' It uses a simple branching process model that simulates independent
#' chains, using an offspring distribution for each chain. Each chain
#' uses a threshold chain size or length as the
#' stopping criterion especially where R0 > 1. The function also optionally
#' accepts population related inputs such as the population size (defaults
#' to Inf) and percentage of the population initially immune (defaults to 0).
#' @inheritParams simulate_chains
#' @param stat_threshold A stopping criterion for individual chain simulations;
#' a positive number coercible to integer. When any chain's cumulative statistic
#' reaches or surpasses `stat_threshold`, that chain ends. It also serves as a
#' censoring limit so that results above the specified value, are set to `Inf`.
#' Defaults to `Inf`.
#' @return An object of class `<epichains_summary>`, which is a numeric
#' vector of chain sizes or lengths with extra attributes for storing the
#' simulation parameters.
#' @inheritSection simulate_chains Definition of a transmission chain
#' @inheritSection simulate_chains Calculating chain sizes and lengths
#' @inherit simulate_chains references
#' @details
#' # `simulate_chain_stats()` vs `simulate_chains()`
#' `simulate_chain_stats()` is a time-invariant version of `simulate_chains()`.
#' In particular, it does not track the details of individual transmission
#' events but deals with eventual chain statistics, that is, the statistic
#' realised by a chain after dying out.
#'
#' It is useful for generating a vector of chain sizes or lengths for a given
#' number of chains, if details of who infected whom and the timing of
#' infection are not of interest.
#'
#' This function is used in `{epichains}` for calculating likelihoods in
#' the `likelihood()` function and for sampling from the borel
#' distribution (See ?epichains::rborel). It is used extensively in the
#' vignette on
# nolint start
#' [modelling disease control](https://epiverse-trace.github.io/epichains/articles/interventions.html),
# nolint end
#' where only data on observed chain sizes and lengths are available.
#' @author James M. Azam, Sebastian Funk
#' @examples
#' # Simulate chain sizes with a poisson offspring distribution, assuming an
#' # infinite population and no immunity.
#' set.seed(32)
#' simulate_chain_stats(
#'   n_chains = 20,
#'   statistic = "size",
#'   offspring_dist = rpois,
#'   stat_threshold = 10,
#'   lambda = 0.9
#' )
#' # Simulate chain sizes with a negative binomial distribution and assuming
#' # a finite population and 10% immunity.
#' set.seed(32)
#' simulate_chain_stats(
#'   pop = 1000,
#'   percent_immune = 0.1,
#'   n_chains = 20,
#'   statistic = "size",
#'   offspring_dist = rnbinom,
#'   stat_threshold = 10,
#'   mu = 0.9,
#'   size = 0.36
#' )
#' @export
simulate_chain_stats <- function(n_chains,
                                 statistic = c("size", "length"),
                                 offspring_dist,
                                 ...,
                                 stat_threshold = Inf,
                                 pop = Inf,
                                 percent_immune = 0) {
  # Check offspring and population-related arguments
  .check_sim_args(
    n_chains = n_chains,
    statistic = statistic,
    offspring_dist = offspring_dist,
    stat_threshold = stat_threshold,
    pop = pop,
    percent_immune = percent_immune
  )
  # Gather offspring distribution parameters
  pars <- list(...)

  # Initialisations
  stat_track <- rep(1, n_chains) ## track statistic
  n_offspring <- rep(1, n_chains) ## current number of offspring
  chains_active <- seq_len(n_chains) # track trees being simulated

  # Initialise susceptible population
  susc_pop <- .init_susc_pop(pop, percent_immune, n_chains)

  ## next, simulate outbreaks from the initial chains
  while (length(chains_active) > 0 && susc_pop > 0) {
    # simulate the possible next generation of offspring
    next_gen <- .sample_possible_offspring(
      offspring_func = offspring_dist,
      offspring_func_pars = pars,
      n_offspring = n_offspring,
      chains = chains_active
    )
    # from all possible offspring, get those that are infectible
    next_gen <- .get_susceptible_offspring(
      new_offspring = next_gen,
      susc_pop = susc_pop,
      pop = pop
    )
    # Adjust next_gen if the number of offspring is greater than the
    # susceptible population
    if (sum(next_gen) > susc_pop) {
      next_gen <- .adjust_next_gen(
        next_gen = next_gen,
        susc_pop = susc_pop
      )
    }
    # create ids linking active chains to their offspring
    active_chain_ids <- rep(chains_active, n_offspring[chains_active])

    ## initialise number of offspring
    n_offspring <- rep(0, n_chains)
    ## assign offspring sum to their corresponding chains
    n_offspring[chains_active] <- tapply(next_gen, active_chain_ids, sum)

    # track size/length
    stat_track <- .update_chain_stat(
      stat_type = statistic,
      stat_latest = stat_track,
      n_offspring = n_offspring
    )
    # Update susceptible population
    susc_pop <- susc_pop - sum(n_offspring)
    # only continue chains that have offspring and haven't reached
    # stat_threshold
    chains_active <- which(n_offspring > 0 & stat_track < stat_threshold)
  }

  stat_track[stat_track >= stat_threshold] <- Inf

  out <- .epichains_summary(
    chains_summary = stat_track,
    n_chains = n_chains,
    statistic = statistic,
    offspring_dist = offspring_dist,
    stat_threshold = stat_threshold
  )

  return(out)
}

Try the epichains package in your browser

Any scripts or data that you put into this service are public.

epichains documentation built on Oct. 14, 2024, 5:10 p.m.