surv_simulate: Simulate genomic surveillance data

View source: R/07-simulate.R

surv_simulateR Documentation

Simulate genomic surveillance data

Description

Generates synthetic surveillance datasets with realistic features: multiple regions with unequal sequencing rates, multiple lineages with time-varying prevalence, configurable reporting delays, and multiple sample sources.

Usage

surv_simulate(
  n_regions = 5L,
  n_weeks = 26L,
  total_positive_per_week = 1000L,
  sequencing_rates = NULL,
  lineage_dynamics = NULL,
  delay_params = list(mu = 10, size = 3),
  sources = c("clinical", "wastewater", "sentinel"),
  source_weights = c(0.7, 0.2, 0.1),
  seed = NULL
)

Arguments

n_regions

Integer. Number of geographic regions. Default 5.

n_weeks

Integer. Number of epiweeks. Default 26.

total_positive_per_week

Integer. Mean total positive cases per week across all regions. Default 1000.

sequencing_rates

Numeric vector of length n_regions. Per-region sequencing probability. If NULL, generated from a Beta distribution with realistic inequality. Default NULL.

lineage_dynamics

Named list of functions, each taking a week number and returning a positive weight. If NULL, uses a default four-lineage scenario. Default NULL.

delay_params

List with mu and size for negative binomial reporting delay. Default list(mu = 10, size = 3).

sources

Character vector of sample source types. Default c("clinical", "wastewater", "sentinel").

source_weights

Numeric vector (same length as sources). Default c(0.7, 0.2, 0.1).

seed

Integer or NULL. Random seed. Default NULL.

Value

A named list with elements:

sequences

Tibble of individual sequence records.

population

Tibble with one row per region.

truth

Tibble of true lineage prevalence by region and week.

parameters

List of all input parameters.

Examples

sim <- surv_simulate(n_regions = 3, n_weeks = 8, seed = 42)
head(sim$sequences)
sim$population


survinger documentation built on April 27, 2026, 9:10 a.m.