knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4, dev = "png")
From sample collection to sequence upload, there is a delay of typically 1--4 weeks. This means that when you look at the latest data, the most recent weeks are always incomplete --- not because fewer people were infected, but because results have not arrived yet.
If you ignore this and plot raw counts, you see a false decline in the most recent weeks. This is called right-truncation bias.
survinger fits a parametric delay distribution accounting for the fact that we can only observe delays shorter than the time elapsed since collection (right-truncation correction).
library(survinger) data(sarscov2_surveillance) design <- surv_design( data = sarscov2_surveillance$sequences, strata = ~ region, sequencing_rate = sarscov2_surveillance$population[c("region", "seq_rate")], population = sarscov2_surveillance$population ) delay_fit <- surv_estimate_delay(design, distribution = "negbin") print(delay_fit) plot(delay_fit)
Given the fitted delay, we can ask: what fraction of sequences collected d days ago have been reported by now?
days <- c(7, 14, 21, 28) probs <- surv_reporting_probability(delay_fit, delta = days) data.frame(days_ago = days, prob_reported = round(probs, 3))
Sequences collected 7 days ago may only be partially reported, while those from 28 days ago are nearly complete.
Nowcasting inflates observed counts by dividing by the reporting probability, giving a better estimate of the true number:
nowcast <- surv_nowcast_lineage(design, delay_fit, "BA.2.86") plot(nowcast)
The grey bars show what has been observed; the orange line shows the delay-corrected estimate. The gap is largest in the most recent weeks.
The main inference function applies both corrections simultaneously:
adjusted <- surv_adjusted_prevalence(design, delay_fit, "BA.2.86") print(adjusted)
The mean_report_prob column shows how complete each week's data is.
Low values indicate that the delay correction is doing heavy lifting.
negbin (default): Handles overdispersion well. Recommended for
most settings.poisson: Use when delays are very regular (rare).lognormal: Use when delays have a heavy right tail.nonparametric: No distributional assumption. Use when you have
enough data and suspect the parametric forms do not fit.Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.