R/get_si.R

Defines functions get_si

Documented in get_si

#' Compute the serial interval (si) from transmission trees
#'
#' The serial interval is the time between the onset of symptoms in an infector-infectee pair.
#' This function computes the serial interval statistics from a list of transmission trees.
#'
#' @param trees A list of data frames, generated by \code{\link{get_trees}}.
#' It should contain information about the dates of onset.
#'
#' @param date_suffix A string indicating the suffix for date of onset columns.
#' Default is "date", which means the columns should be named `from_date` and `to_date`.
#'
#' @param stats A list of functions to compute statistics. Default is:
#' \itemize{
#'   \item `mean`: the mean serial interval.
#'   \item `lwr`: the 2.5th percentile (lower quantile).
#'   \item `upr`: the 97.5th percentile (upper quantile).
#' }
#' Each function should take a numeric vector as input and return a single numeric value.
#'
#' @return A data frame with serial interval statistic
#'
#' @seealso \code{\link{get_trees}} for generating a list of transmission trees.
#'
#' @importFrom stats ecdf aggregate
#'
#' @examples
#' trees <- get_trees(out, date = linelist$onset)
#' si_stats <- get_si(trees)
#' str(si_stats)
#'
#' @export
get_si <- function(trees,
                   date_suffix = "date",
                   stats = list(
                     mean = mean,
                     lwr = function(x) quantile(x, 0.025, na.rm = TRUE),
                     upr = function(x) quantile(x, 0.975, na.rm = TRUE)
                   )) {

  # Construct column names
  from_col <- paste0("from_", date_suffix)
  to_col <- paste0("to_", date_suffix)

  # Calculate serial intervals for each tree
  si_list <- lapply(trees, function(x) {
    x <- x[!is.na(x[[from_col]]), ]
    x[[to_col]] - x[[from_col]]
  })

  # Get the range from all serial intervals
  all_si <- unlist(si_list)
  si_min <- floor(min(all_si, na.rm = TRUE))
  si_max <- ceiling(max(all_si, na.rm = TRUE))
  x_range <- seq(si_min, si_max, by = 1)

  # Calculate ECDF values using the pre-computed serial intervals
  si_ecdf <- do.call(rbind, lapply(si_list, function(si) {
    data.frame(x = x_range, y = stats::ecdf(si)(x_range))
  }))

  # Calculate requested statistics
  result <- data.frame(x = x_range)
  for (stat_name in names(stats)) {
    stat_values <- stats::aggregate(y ~ x, si_ecdf, stats[[stat_name]])
    result[[stat_name]] <- stat_values$y
  }

  return(result)
}

Try the o2ools package in your browser

Any scripts or data that you put into this service are public.

o2ools documentation built on June 8, 2025, 10:18 a.m.