wzxhzdk:0 # Tracking SARS-CoV-2 In England
Previous examples conditioned on a single observation series (case and death counts respectively). Simple models were used for describing ascertainment rates. The flu [example](flu.html) assumed *full ascertainment* of infections, while the multilevel [example](europe-covid.html) took the IFR to be *constant but unknown*. Population adjustments were not considered, and both examples modeled the entire history of the epidemic from the first observed cases. Here we model the trajectory of SARS-CoV-2 in England using data over a two month period from late March 2021 through to the end of May. In doing so, we relax the aforementioned modeling assumptions and demonstrate further capabilities of the package. Population adjustments are applied in order to explicitly account for pre-existing immunity in the population. The model is conditioned on two observation series: daily case counts, and positivity from seroprevalence surveys. This example is inspired by this [paper](https://rss.org.uk/RSS/media/File-library/News/2021/MishraScott.pdf), which uses **epidemia** to track SARS-CoV-2 at the local authority level in the UK. Fitting the model to case counts requires a model for the case ascertainment rates (IAR). Unlike in previous examples, we do not assume that this is constant. While the IFR of Sars-CoV-2 may be broadly stable over several weeks or months, the IAR could vary frequently, not least due to changes in testing capacity, surge testing, and the general prevalence of the disease. Therefore we opt instead to *infer the time-varying IAR from positivity data*, which should provide an "anchor" for the absolute size of the latent infection process. We allow IAR dynamics to be inferred by modeling it as a random walk. Our goal is to infer recent infection dynamics, and also to project future infection and case counts. These goals do not necessitate a detailed understanding of the fully history of disease dynamics, and is primarily why we limit ourselves to a two month period. Other reasons include that modeling the entire history is computationally demanding and requires more intricate modeling to explain the full trajectory. First load required packages. wzxhzdk:1 ## Data
**epidemia** has a data set `EnglandNewCases` that contains daily counts of confirmed SARS-CoV-2 infections in England from the 1^st^ January 2020 up to and including the 30^th^ May 2021. The data was downloaded from @PHE on the 4^nd^ of June 2021. wzxhzdk:2 The model will be fit to the most recent two months (60 days) of case count data, starting on the 1^st^ April. We take the first modeled date to be 20 days prior to this initial observation, which is the 12^th^ March. This initial 20 days will be the seeding period. wzxhzdk:3 ## Transmission Model
As in the flu example, we model $R_t$ as a transformed random walk with daily updates. In later sections, however, we will make projections that assume $R_t$ remains at its current value. For this reason, we provide a new column in `data` to control the random walk increments. This will allow us to "stop" the walk updating at dates after the 30^th^ May. wzxhzdk:4 The model for $R_t$ is $$ R_t = K g^{-1}\left(\beta_0 + w_t\right), $$ where $K= 7$ and $g$ is the logit link. The term $w_t$ is a random walk satisfying the recursion $w_t = w_{t-1} + \epsilon_t$ and initial condition $w_{-1} = 0$. The daily updates $\epsilon_t$ are given a prior $\epsilon_t \sim \mathcal{N}(0, \sigma)$, and the scale hyperparameter follows $\sigma \sim \mathcal{N}^{+}(0, 0.05)$. The intercept $\beta_0$ is assigned a normal prior, so that $\beta_0 \sim \mathcal{N}(-1, 1)$. This prior has been chosen to reflect our belief that the epidemic was under control at start of March. We have, however, provided a large scale to reflect uncertainty over the initial value. The full transmission model is represented as follows. wzxhzdk:5 ## Infection Model
Daily recorded cases are consistently above 1000 over the period considered. Given the large infection counts, we do not consider extending the infection model to add variation around the renewal equation. Such extensions would be useful for modeling the epidemic at a finer scale, say at the local authority level. Please refer [here](flu.html) for an example of how this is done. We account for pre-existing immunity through population adjustments, which were described in the model description. The first modeled date is the 12^th^ of March 2021, over a year before SARS-CoV-2 was first detected in England. It is likely that prior infection and vaccination will have induced a degree of immunity within the population. Accounting for this helps to constrain long term forecasts for the size of the susceptible population. Mathematically, our model describes infections $i_t$ for $t>0$ by ```{=tex} \begin{align*} i'_t & = R_t\sum_{s < t} i_s \pi_{t-s},\\ i_t &= S_t \left( 1 - \exp\left(-\frac{i'_t}{P}\right)\right), \end{align*} wzxhzdk:6 The infection model is expressed as follows. wzxhzdk:7 ## Observation Models
Models are defined for our two data series: case counts and positivity rates derived from a seroprevalence study. These are discussed in turn. ### Case Counts Letting $Y^{(1)}_{t}$ denote daily case counts, we model $Y_t^{(1)} \sim \text{QP}(y^{(1)}_t,d)$ and $$ y_t = \alpha^{(1)}_{t}\left(\frac{1}{7} \sum_{s=1}^{7} i_{t-s-3}\right), $$ where $\text{QP}$ denotes the quasi-poisson distribution with mean $y^{(1)}_t$ and variance-to-mean ratio $d$, which is given the prior $d \sim \mathcal{N}^{+}(10,5)$. The assumption here is that infections over the last three days are undetected, and those occurring over the week before are equally likely to be detected. The IAR $\alpha_t^{(1)}$ is modeled as a random walk with additional dummy dummy day-of-week variables to account for a clear weekly pattern in the data (see Figure \@ref(fig:eng-proj-obs)). We take $$ \alpha^{(1)}_t = g^{-1}\left(\beta^{(1)}_0 + w^{(1)}_t + \sum_{k=1}^{6} \gamma_k D_{k,t}\right), $$ where $g$ is the logit-link, $\beta_0^{(1)} \sim \mathcal{N}(0,1.5)$ is an intercept, and $w_t^{(1)}$ is a random walk following the same model as the walk for $R_t$, however starting on the first observation date (1^st^ April) rather than immediately after the seeding period. The prior on $\beta_0^{(1)}$ has a large scale to reflect our prior uncertainty over the IAR, allowing the posterior to be largely driven by the positivity data. The terms $D_{k,t}$ are dummy indicators for the day of the week, and $\gamma_k \sim \mathcal{N}(0, 0.5)$ are associated weekday effects. These terms allow modeling weekly seasonality. First add the day-of-week factors to `data`. wzxhzdk:8 The aforementioned model is defined as follows. wzxhzdk:9 ### Seroprevalence Data The ONS Infection survey [@ons_infections] provides weekly estimates of positivity rates among the English population. These numbers are based on repeated RT-PCR tests on a representative sample of the population. We leverage this data for the week beginning the 28^th^ March, and for the subsequent 6 weeks. We arbitrarily use Thursday of each week as the representative date for the positivity rates. wzxhzdk:10 Let $Y_t^{(2)}$ be the observed positivity rate at time $t$. This is an estimate of the proportion of the population who, at that point of time, would test positive on an RT-PCR test. We model this as $Y_t^{(2)} \sim \mathcal{N}(y_{t}^{(2)}, \sigma^2)$, where \begin{equation} y_t^{(2)} = \frac{\sum_{s=1}^{14} i_{t-s-3}}{P}. (\#eq:mean-positive) \end{equation} Equation \@ref(eq:mean-positive) assumes that those infected in the previous three days will definitively test negative, and that everyone infected within the two weeks before this will definitively test positive. All infections occurring before this will not be detected. This is, of course, a simplification. A careful model requires an understanding of the likelihood of testing positive in an RT-PCR test $k$ days after infection, and also accounting for the sensitivity/specificity of the tests. We have made the above assumption in the absence of such information. Equation \@ref(eq:mean-positive) does not fit within our usual framework for observations. In particular, there is no explicit ascertainment rate, and the infection weights $\pi_k$ do not form a probability distribution. Nonetheless, infections are still weighted linearly, and this model can be represented as follows. wzxhzdk:11 Here we have assigned a prior $\sigma \sim \mathcal{N}^+(0.01, 0.0025)$. This is motivated by the credible intervals in the ONS study. ## Fitting and Analysis
The model is fit with the NUTS sampler. In order to ensure a large enough effective sample size, we increase the maximum tree depth for the algorithm and use 4,000 iterations. wzxhzdk:12 Figure \@ref(fig:eng-rt) plots credible intervals for $R_t$ over the two month period for which we have data. The model appears confident that $R_t$ has risen above $1$ over this time, and indeed daily cases appear to be rising as of the end of May. The credible intervals widen towards 30^th^ May because there is a lag between reproduction numbers and recording cases. wzxhzdk:13 The role of the population adjustment in this model is best understood by looking at long term forecasts. As was shown in the multilevel example, to make forecasts we have to construct a new data frame. This can be passed as the `newdata` argument to `epidemia`'s plotting functions. wzxhzdk:14 For all dates after the modeled period, the `dt` column is equal the the final modeled day (30^th^ May). In effect, this fixes the random walks for $R_t$ and $\alpha_t^{(2)}$ at their most recent value. All forecasts made here are conditional on this assumption. Figure \@ref(fig:eng-proj-rt) forecasts both $R_t$ and cumulative infections. The leftmost panel shows the *unadjusted* reproduction number, which we denote by $R_t^u$. This is the value of $R_t$ if the entire population were susceptible, and is the quantity modeled by `epirt()`. The `adjusted` $R_t$ is then taken to be $$ R_t = \frac{S_{t-1}}{P} R^{\text{u}}_t. $$ Figure \@ref(fig:eng-proj-rt) shows that $R_t^u$ remains pathwise constant over the projected period. $R_t$ on the other hand falls smoothly as the susceptible population is diminished. Cumulative infections also taper out as the reduction in $S_t$ causes $R_t$ to fall below $1$. The population adjustment serves to *constrain* long-term forecasts for the total size of the population. Without this adjustment infections could continue to rise exponentially over time. wzxhzdk:15 The top panel of Figure \@ref(fig:eng-proj-obs) shows the IAR over time. The IAR exhibits strong weekday effects, and appears to increase during the first half of April. The pattern repeats after the 30^th^ May because we have stopped the random walk. The bottom two plots demonstrate that the model is able to fit the observed data well. These plots have been generated with `plot_linpred()` and `plot_obs()`. wzxhzdk:16 The primary purpose of this example is to demonstrate advanced modeling in **epidemia**. The model itself has many limitations, These include the quite naive specification of the prior on $S_t$, and not accounting for vaccinations during the modeled period. In addition the `i2o` argument used in the observational models is not motivated by data from previous studies. # References


ImperialCollegeLondon/epidemia documentation built on June 26, 2021, 7:40 a.m.