Analyzing real CDC surveillance data"

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 4
)

Overview

This vignette demonstrates the complete lineagefreq workflow on real surveillance data from the U.S. CDC. The built-in dataset cdc_sarscov2_jn1 contains actual weighted variant proportion estimates from CDC's national genomic surveillance program, covering the JN.1 emergence wave (October 2023 to June 2024).

Load data

library(lineagefreq)

data(cdc_sarscov2_jn1)
str(cdc_sarscov2_jn1)
vd <- lfq_data(cdc_sarscov2_jn1,
               lineage = lineage, date = date, count = count)
vd

Collapse rare lineages

During JN.1's rise, several lineages circulated at low frequency. We collapse those below 5% peak frequency into "Other".

vd_clean <- collapse_lineages(vd, min_freq = 0.05)
attr(vd_clean, "lineages")

Fit MLR model

fit <- fit_model(vd_clean, engine = "mlr")
fit

Growth advantages

ga <- growth_advantage(fit, type = "relative_Rt",
                       generation_time = 5)
ga
autoplot(fit, type = "advantage", generation_time = 5)

JN.1 shows a strong growth advantage over previously circulating XBB-derived lineages, consistent with published CDC estimates.

Frequency trajectories

autoplot(fit, type = "trajectory")

Forecast

fc <- forecast(fit, horizon = 28)
autoplot(fc)

Emergence detection

summarize_emerging(vd_clean)

Sequencing power

How many sequences per week are needed to detect a variant at 1%?

sequencing_power(
  target_precision = 0.05,
  current_freq = c(0.01, 0.02, 0.05)
)

Session info

sessionInfo()


Try the lineagefreq package in your browser

Any scripts or data that you put into this service are public.

lineagefreq documentation built on April 3, 2026, 9:09 a.m.