Design-adjusted inference for pathogen lineage surveillanceunder unequal sequencing and reporting delays
Genomic surveillance systems sequence unevenly. Denmark sequences 12% of cases; Romania sequences 0.3%. If you estimate lineage prevalence by counting sequences, the result is dominated by Denmark — regardless of what is actually circulating across Europe.
On real ECDC data, this produces up to 14 percentage points of error:
The red shaded area is the bias eliminated by design weighting. survinger corrects this using Horvitz-Thompson / Hajek estimators with Wilson score confidence intervals.
Each country's bias depends on its sequencing rate and its local prevalence, and both change over time. Poland (under-sequenced, high prevalence) is systematically underweighted by naive methods. A single correction factor cannot fix this — you need per-stratum, per-period weights.
In controlled simulation (50 replicates × 6 inequality levels), the Hajek estimator maintains 0.6–2.5 pp absolute bias while the naive estimator reaches 3.2–8.7 pp. The advantage holds across all levels of sequencing inequality.
# install.packages("remotes")
remotes::install_github("CuiweiG/survinger")
library(survinger)
# Simulate surveillance data (or use your own)
sim <- surv_simulate(n_regions = 5, n_weeks = 26, seed = 42)
# Create design from surveillance data
design <- surv_design(
data = sim$sequences, strata = ~ region,
sequencing_rate = sim$population[c("region", "seq_rate")],
population = sim$population
)
# Corrected prevalence (one line)
surv_lineage_prevalence(design, "BA.2.86")
# Or even simpler — single pipe-friendly call:
surv_estimate(
data = sim$sequences, strata = ~ region,
sequencing_rate = sim$population[c("region", "seq_rate")],
population = sim$population, lineage = "BA.2.86"
)
# Full pipeline with delay correction
delay <- surv_estimate_delay(design)
surv_adjusted_prevalence(design, delay, "BA.2.86")
# How should I allocate 500 sequences?
surv_optimize_allocation(design, "min_mse", total_capacity = 500)
# Is my system powerful enough?
surv_detection_probability(design, true_prevalence = 0.01)
# One-page diagnostic
surv_report(design)
| Function | Purpose |
|----------|---------|
| surv_design() | Create design with inverse-probability weights |
| surv_simulate() | Generate synthetic surveillance data |
| surv_filter() | Subset a design by filter criteria |
| surv_update_rates() | Update sequencing rates |
| surv_set_weights() | Override design weights |
| Function | Purpose |
|----------|---------|
| surv_lineage_prevalence() | Hajek / HT / post-stratified prevalence |
| surv_naive_prevalence() | Unweighted baseline prevalence |
| surv_prevalence_by() | Prevalence by subgroup (region, source, etc.) |
| surv_estimate() | Pipe-friendly one-call analysis |
| Function | Purpose |
|----------|---------|
| surv_estimate_delay() | Right-truncation-corrected delay fitting |
| surv_reporting_probability() | Cumulative reporting probability |
| surv_nowcast_lineage() | Delay-adjusted nowcast |
| surv_adjusted_prevalence() | Combined design + delay correction |
| Function | Purpose |
|----------|---------|
| surv_optimize_allocation() | Neyman allocation (3 objectives) |
| surv_compare_allocations() | Benchmark all allocation strategies |
| surv_required_sequences() | Sample size for target detection power |
| Function | Purpose |
|----------|---------|
| surv_detection_probability() | Variant detection power |
| surv_power_curve() | Detection probability across prevalence range |
| surv_compare_estimates() | Weighted vs naive side-by-side plot |
| surv_design_effect() | Design effect over time |
| surv_sensitivity() | Sensitivity analysis across all methods |
| surv_report() | Surveillance system diagnostic |
| surv_quality() | One-row quality metrics |
| Function | Purpose |
|----------|---------|
| tidy() / glance() | Broom-style tidying for all result objects |
| surv_bind() | Combine multiple prevalence estimates |
| surv_table() | Publication-ready formatted table |
| theme_survinger() | Publication-quality ggplot2 theme |
| | phylosamp | survey | epinowcast | survinger | |---|---|---|---|---| | Question | How many? | General surveys | Bayesian nowcast | Allocate + correct + nowcast | | Genomic-specific | ✓ | ✗ | Partial | ✓ | | Allocation | ✗ | ✗ | ✗ | ✓ (3 objectives) | | Delay correction | ✗ | ✗ | ✓ | ✓ | | Requires Stan | ✗ | ✗ | ✓ | ✗ | | CRAN-friendly | ✓ | ✓ | ✗ | ✓ |
survey::svymean (exact match)vignette("survinger") — Quick startvignette("allocation-optimization") — Resource allocationvignette("delay-correction") — Delay estimation and nowcastingvignette("real-world-ecdc") — ECDC case study@Manual{survinger2026,
title = {survinger: Design-Adjusted Inference for Pathogen Lineage Surveillance},
author = {Cuiwei Gao},
year = {2026},
note = {R package version 0.1.0},
url = {https://github.com/CuiweiG/survinger}
}
MIT © 2026 Cuiwei Gao
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.