wzxhzdk:0
# Tracking SARS-CoV-2 In England
Previous examples conditioned on a single observation series (case and
death counts respectively). Simple models were used for describing
ascertainment rates. The flu [example](flu.html) assumed
*full ascertainment* of infections, while the multilevel [example](europe-covid.html) took the IFR to be *constant but unknown*.
Population adjustments were not considered, and both examples modeled
the entire history of the epidemic from the first observed cases.
Here we model the trajectory of SARS-CoV-2 in England using data over a
two month period from late March 2021 through to the end of May. In
doing so, we relax the aforementioned modeling assumptions and
demonstrate further capabilities of the package. Population adjustments
are applied in order to explicitly account for pre-existing immunity in
the population. The model is conditioned on two observation series:
daily case counts, and positivity from seroprevalence surveys. This example is
inspired by this [paper](https://rss.org.uk/RSS/media/File-library/News/2021/MishraScott.pdf), which uses **epidemia** to track SARS-CoV-2 at the local authority level in the UK.
Fitting the model to case counts requires a model for the case
ascertainment rates (IAR). Unlike in previous examples, we do not assume
that this is constant. While the IFR of Sars-CoV-2 may be broadly stable
over several weeks or months, the IAR could vary frequently, not least
due to changes in testing capacity, surge testing, and the general
prevalence of the disease. Therefore we opt instead to *infer the
time-varying IAR from positivity data*, which should provide an "anchor"
for the absolute size of the latent infection process. We allow IAR
dynamics to be inferred by modeling it as a random walk.
Our goal is to infer recent infection dynamics, and also to project
future infection and case counts. These goals do not necessitate a
detailed understanding of the fully history of disease dynamics, and is
primarily why we limit ourselves to a two month period. Other reasons
include that modeling the entire history is computationally demanding
and requires more intricate modeling to explain the full trajectory.
First load required packages.
wzxhzdk:1
## Data
**epidemia** has a data set `EnglandNewCases` that contains daily
counts of confirmed SARS-CoV-2 infections in England from the 1^st^
January 2020 up to and including the 30^th^ May 2021. The data was
downloaded from @PHE on the 4^nd^ of June 2021.
wzxhzdk:2
The model will be fit to the most recent two months (60 days) of case
count data, starting on the 1^st^ April. We take the first modeled date
to be 20 days prior to this initial observation, which is the 12^th^
March. This initial 20 days will be the seeding period.
wzxhzdk:3
## Transmission Model
As in the flu example, we model $R_t$ as a transformed random walk
with daily updates. In later sections, however, we will make projections
that assume $R_t$ remains at its current value. For this reason, we
provide a new column in `data` to control the random walk
increments. This will allow us to "stop" the walk updating at dates
after the 30^th^ May.
wzxhzdk:4
The model for $R_t$ is
$$
R_t = K g^{-1}\left(\beta_0 + w_t\right),
$$
where $K= 7$ and $g$ is the logit link. The term $w_t$ is a random walk
satisfying the recursion $w_t = w_{t-1} + \epsilon_t$ and initial
condition $w_{-1} = 0$. The daily updates $\epsilon_t$ are given a prior
$\epsilon_t \sim \mathcal{N}(0, \sigma)$, and the scale hyperparameter
follows $\sigma \sim \mathcal{N}^{+}(0, 0.05)$. The intercept $\beta_0$
is assigned a normal prior, so that $\beta_0 \sim \mathcal{N}(-1, 1)$.
This prior has been chosen to reflect our belief that the epidemic was
under control at start of March. We have, however, provided a large
scale to reflect uncertainty over the initial value. The full
transmission model is represented as follows.
wzxhzdk:5
## Infection Model
Daily recorded cases are consistently above 1000 over the period
considered. Given the large infection counts, we do not consider
extending the infection model to add variation around the renewal
equation. Such extensions would be useful for modeling the epidemic at a
finer scale, say at the local authority level. Please refer [here](flu.html) for
an example of how this is done.
We account for pre-existing immunity through population adjustments,
which were described in the model description. The first modeled
date is the 12^th^ of March 2021, over a year before SARS-CoV-2 was
first detected in England. It is likely that prior infection and
vaccination will have induced a degree of immunity within the
population. Accounting for this helps to constrain long term forecasts
for the size of the susceptible population.
Mathematically, our model describes infections $i_t$ for $t>0$ by
```{=tex}
\begin{align*}
i'_t & = R_t\sum_{s < t} i_s \pi_{t-s},\\
i_t &= S_t \left( 1 - \exp\left(-\frac{i'_t}{P}\right)\right),
\end{align*}
wzxhzdk:6
The infection model is expressed as follows.
wzxhzdk:7
## Observation Models
Models are defined for our two data series: case counts and positivity
rates derived from a seroprevalence study. These are discussed in turn.
### Case Counts
Letting $Y^{(1)}_{t}$ denote daily case counts, we model
$Y_t^{(1)} \sim \text{QP}(y^{(1)}_t,d)$ and
$$
y_t = \alpha^{(1)}_{t}\left(\frac{1}{7} \sum_{s=1}^{7} i_{t-s-3}\right),
$$
where $\text{QP}$ denotes the quasi-poisson distribution with mean
$y^{(1)}_t$ and variance-to-mean ratio $d$, which is given the prior
$d \sim \mathcal{N}^{+}(10,5)$. The assumption here is that infections
over the last three days are undetected, and those occurring over the
week before are equally likely to be detected.
The IAR $\alpha_t^{(1)}$ is modeled as a random walk with additional
dummy dummy day-of-week variables to account for a clear weekly pattern
in the data (see Figure \@ref(fig:eng-proj-obs)). We take
$$
\alpha^{(1)}_t = g^{-1}\left(\beta^{(1)}_0 + w^{(1)}_t + \sum_{k=1}^{6} \gamma_k D_{k,t}\right),
$$
where $g$ is the logit-link, $\beta_0^{(1)} \sim \mathcal{N}(0,1.5)$ is
an intercept, and $w_t^{(1)}$ is a random walk following the same model
as the walk for $R_t$, however starting on the first observation date
(1^st^ April) rather than immediately after the seeding period. The
prior on $\beta_0^{(1)}$ has a large scale to reflect our prior
uncertainty over the IAR, allowing the posterior to be largely driven by
the positivity data. The terms $D_{k,t}$ are dummy indicators for the
day of the week, and $\gamma_k \sim \mathcal{N}(0, 0.5)$ are associated
weekday effects. These terms allow modeling weekly seasonality. First
add the day-of-week factors to `data`.
wzxhzdk:8
The aforementioned model is defined as follows.
wzxhzdk:9
### Seroprevalence Data
The ONS Infection survey [@ons_infections] provides weekly estimates of
positivity rates among the English population. These numbers are based
on repeated RT-PCR tests on a representative sample of the population.
We leverage this data for the week beginning the 28^th^ March, and for
the subsequent 6 weeks. We arbitrarily use Thursday of each week as the
representative date for the positivity rates.
wzxhzdk:10
Let $Y_t^{(2)}$ be the observed positivity rate at time $t$. This is an
estimate of the proportion of the population who, at that point of time,
would test positive on an RT-PCR test. We model this as
$Y_t^{(2)} \sim \mathcal{N}(y_{t}^{(2)}, \sigma^2)$, where
\begin{equation}
y_t^{(2)} = \frac{\sum_{s=1}^{14} i_{t-s-3}}{P}.
(\#eq:mean-positive)
\end{equation}
Equation \@ref(eq:mean-positive) assumes that those infected in the
previous three days will definitively test negative, and that everyone
infected within the two weeks before this will definitively test
positive. All infections occurring before this will not be detected.
This is, of course, a simplification. A careful model requires an
understanding of the likelihood of testing positive in an RT-PCR test
$k$ days after infection, and also accounting for the
sensitivity/specificity of the tests. We have made the above assumption
in the absence of such information.
Equation \@ref(eq:mean-positive) does not fit within our usual
framework for observations. In particular, there is no explicit
ascertainment rate, and the infection weights $\pi_k$ do not form a
probability distribution. Nonetheless, infections are still weighted
linearly, and this model can be represented as follows.
wzxhzdk:11
Here we have assigned a prior $\sigma \sim \mathcal{N}^+(0.01, 0.0025)$.
This is motivated by the credible intervals in the ONS study.
## Fitting and Analysis
The model is fit with the NUTS sampler. In order to ensure a large
enough effective sample size, we increase the maximum tree depth for the
algorithm and use 4,000 iterations.
wzxhzdk:12
Figure \@ref(fig:eng-rt) plots credible intervals for $R_t$ over the two
month period for which we have data. The model appears confident that
$R_t$ has risen above $1$ over this time, and indeed daily cases appear
to be rising as of the end of May. The credible intervals widen towards
30^th^ May because there is a lag between reproduction numbers and
recording cases.
wzxhzdk:13
The role of the population adjustment in this model is best understood
by looking at long term forecasts. As was shown in the multilevel example, to make forecasts we have to construct a new data
frame. This can be passed as the `newdata` argument to
`epidemia`'s plotting functions.
wzxhzdk:14
For all dates after the modeled period, the `dt` column is equal
the the final modeled day (30^th^ May). In effect, this fixes the random
walks for $R_t$ and $\alpha_t^{(2)}$ at their most recent value. All
forecasts made here are conditional on this assumption.
Figure \@ref(fig:eng-proj-rt) forecasts both $R_t$ and cumulative
infections. The leftmost panel shows the *unadjusted*
reproduction number, which we denote by $R_t^u$. This is the value of
$R_t$ if the entire population were susceptible, and is
the quantity modeled by `epirt()`. The `adjusted` $R_t$ is
then taken to be
$$
R_t = \frac{S_{t-1}}{P} R^{\text{u}}_t.
$$
Figure \@ref(fig:eng-proj-rt) shows that $R_t^u$ remains pathwise
constant over the projected period. $R_t$ on the other hand falls
smoothly as the susceptible population is diminished. Cumulative
infections also taper out as the reduction in $S_t$ causes $R_t$ to fall
below $1$. The population adjustment serves to *constrain*
long-term forecasts for the total size of the population. Without this
adjustment infections could continue to rise exponentially over time.
wzxhzdk:15
The top panel of Figure \@ref(fig:eng-proj-obs) shows the IAR over time.
The IAR exhibits strong weekday effects, and appears to increase during
the first half of April. The pattern repeats after the 30^th^ May
because we have stopped the random walk. The bottom two plots
demonstrate that the model is able to fit the observed data well. These
plots have been generated with `plot_linpred()` and `plot_obs()`.
wzxhzdk:16
The primary purpose of this example is to demonstrate advanced modeling
in **epidemia**. The model itself has many limitations, These include
the quite naive specification of the prior on $S_t$, and not accounting
for vaccinations during the modeled period. In addition the `i2o`
argument used in the observational models is not motivated by data from
previous studies.
# References