README.md

survinger

Design-adjusted inference for pathogen lineage surveillanceunder unequal sequencing and reporting delays

R-CMD-check version MIT

The problem

Genomic surveillance systems sequence unevenly. Denmark sequences 12% of cases; Romania sequences 0.3%. If you estimate lineage prevalence by counting sequences, the result is dominated by Denmark — regardless of what is actually circulating across Europe.

On real ECDC data, this produces up to 14 percentage points of error:

The red shaded area is the bias eliminated by design weighting. survinger corrects this using Horvitz-Thompson / Hajek estimators with Wilson score confidence intervals.

The bias is structured, not random

Each country's bias depends on its sequencing rate and its local prevalence, and both change over time. Poland (under-sequenced, high prevalence) is systematically underweighted by naive methods. A single correction factor cannot fix this — you need per-stratum, per-period weights.

The correction works

In controlled simulation (50 replicates × 6 inequality levels), the Hajek estimator maintains 0.6–2.5 pp absolute bias while the naive estimator reaches 3.2–8.7 pp. The advantage holds across all levels of sequencing inequality.

Installation

# install.packages("remotes")
remotes::install_github("CuiweiG/survinger")

Quick example

library(survinger)

# Simulate surveillance data (or use your own)
sim <- surv_simulate(n_regions = 5, n_weeks = 26, seed = 42)

# Create design from surveillance data
design <- surv_design(
  data = sim$sequences, strata = ~ region,
  sequencing_rate = sim$population[c("region", "seq_rate")],
  population = sim$population
)

# Corrected prevalence (one line)
surv_lineage_prevalence(design, "BA.2.86")

# Or even simpler — single pipe-friendly call:
surv_estimate(
  data = sim$sequences, strata = ~ region,
  sequencing_rate = sim$population[c("region", "seq_rate")],
  population = sim$population, lineage = "BA.2.86"
)

# Full pipeline with delay correction
delay <- surv_estimate_delay(design)
surv_adjusted_prevalence(design, delay, "BA.2.86")

# How should I allocate 500 sequences?
surv_optimize_allocation(design, "min_mse", total_capacity = 500)

# Is my system powerful enough?
surv_detection_probability(design, true_prevalence = 0.01)

# One-page diagnostic
surv_report(design)

Functions

Design & data

| Function | Purpose | |----------|---------| | surv_design() | Create design with inverse-probability weights | | surv_simulate() | Generate synthetic surveillance data | | surv_filter() | Subset a design by filter criteria | | surv_update_rates() | Update sequencing rates | | surv_set_weights() | Override design weights |

Prevalence estimation

| Function | Purpose | |----------|---------| | surv_lineage_prevalence() | Hajek / HT / post-stratified prevalence | | surv_naive_prevalence() | Unweighted baseline prevalence | | surv_prevalence_by() | Prevalence by subgroup (region, source, etc.) | | surv_estimate() | Pipe-friendly one-call analysis |

Delay correction & nowcasting

| Function | Purpose | |----------|---------| | surv_estimate_delay() | Right-truncation-corrected delay fitting | | surv_reporting_probability() | Cumulative reporting probability | | surv_nowcast_lineage() | Delay-adjusted nowcast | | surv_adjusted_prevalence() | Combined design + delay correction |

Resource allocation

| Function | Purpose | |----------|---------| | surv_optimize_allocation() | Neyman allocation (3 objectives) | | surv_compare_allocations() | Benchmark all allocation strategies | | surv_required_sequences() | Sample size for target detection power |

Diagnostics & reporting

| Function | Purpose | |----------|---------| | surv_detection_probability() | Variant detection power | | surv_power_curve() | Detection probability across prevalence range | | surv_compare_estimates() | Weighted vs naive side-by-side plot | | surv_design_effect() | Design effect over time | | surv_sensitivity() | Sensitivity analysis across all methods | | surv_report() | Surveillance system diagnostic | | surv_quality() | One-row quality metrics |

Tidyverse integration

| Function | Purpose | |----------|---------| | tidy() / glance() | Broom-style tidying for all result objects | | surv_bind() | Combine multiple prevalence estimates | | surv_table() | Publication-ready formatted table | | theme_survinger() | Publication-quality ggplot2 theme |

How it differs from existing tools

| | phylosamp | survey | epinowcast | survinger | |---|---|---|---|---| | Question | How many? | General surveys | Bayesian nowcast | Allocate + correct + nowcast | | Genomic-specific | ✓ | ✗ | Partial | | | Allocation | ✗ | ✗ | ✗ | ✓ (3 objectives) | | Delay correction | ✗ | ✗ | ✓ | | | Requires Stan | ✗ | ✗ | ✓ | | | CRAN-friendly | ✓ | ✓ | ✗ | |

Validated on real data

Vignettes

Citation

@Manual{survinger2026,
  title = {survinger: Design-Adjusted Inference for Pathogen Lineage Surveillance},
  author = {Cuiwei Gao},
  year = {2026},
  note = {R package version 0.1.0},
  url = {https://github.com/CuiweiG/survinger}
}

License

MIT © 2026 Cuiwei Gao



Try the survinger package in your browser

Any scripts or data that you put into this service are public.

survinger documentation built on April 27, 2026, 9:10 a.m.