knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This R package implements an ODE-based model of the novel coronavirus outbreak in Wuhan, China. It presents a simulator and likelihood function assuming Poisson-distributed increments in the number of new cases in Wuhan, in the rest of China via the airline network, and to the rest of the world. The model is described in detail below.
Data required:
china_cases
daily case reports in all Chinese cities (see data(package='wuhan')
)world_cases
daily case reports from other countries (see data(package='wuhan')
)K
daily numbers of passengers going between cities in China via airline network, available from OAG Traffic AnalyzerW
daily numbers of passengers going between Chinese cities and other countries via airline network, available from OAG Traffic Analyzerchina_population
the population size in each Chinese city (see data(package='wuhan')
)Parameters:
beta
the human-human basic transmission rategamma
the removal rate (inverse of infectious period)I0W
the number of initial infectives in Wuhanphi
the case ascertainment rate in WuhanTo use the package, assume the following workflow in R:
# Load required packages > install.packages('devtools') > devtools::install_git('https://github.com/chrism0dwk/wuhan.git') > library(wuhan) # Instantiate ODE model, simulate up to day 22. > simulator = NetworkODEModel(N=china_population, K=K, init_loc='Wuhan', alpha=1/4, max_t=22) # Instantiate LogLikelihood function > llik = LogLikelihood(y=china_cases[,1:22], z=world_cases[,1:22], N=N, K=K, W=W, sim_fun=simulator) # Find MLEs using optimisation > par_init = c(0.4, 0.142857142857143, 1, 0.5) # Starting point > fit = optim(log(par_init), llik, control=list(fnscale=-1)) > p_hat = fit$par
Asymptotic assumptions for confidence intervals fail in our case, since the
parameter space is highly non-orthogonal. Confidence intervals are therefore
calculated using parametric bootstrap. p_hat
is calculated on the log scale (logit scale
for the phi
parameter), so needs to be transformed first:
> p_hat[1:3] = exp(p_hat[1:3]) > p_hat[4] = invlogit(p_hat[4])
The samples can then be drawn by bootstrap, for which a computing cluster is highly recommended (thanks Lancaster University HEC facility!).
> samples = bootstrap(p_hat, K, W, alpha=1/4, max_t=22, n_samples=1000)
Since the airline connectivity matrices are not included in this package, samples
from the parameters (for 4 different values of the latent period $1/\alpha$) are
provided as in-build datasets. See data(package='wuhan')
.
Let $y_{it}$ be the number of new case detections of 2019-nCoV in Chinese city $i=1,\dots,n$ modelled as
$$ y_{is} \sim \mbox{Poisson}(\phi_i x_{is})$$
where $x_{it}$ is the expected number of new case detections in city $i$ on day $s$. $0 \leq \phi_i \leq 1$ is a city-specific case ascertainment probability. In the case of China, we assume that
$$ \phi_i = \begin{cases} \phi & \mbox{if } i = \mbox{`Wuhan'} \ 1 & \mbox{otherwise} \end{cases}$$
Within China, we model an individual's trajectory through infection as susceptible, exposed (and not infectious), infected (and infectious), and finally removed. In continuous time, let $S_{it}$, $E_{it}$, $I_{it}$, and $R_{it}$ be the numbers of susceptible, exposed, infected, and removed individuals respectively in city $i$ at time $t$. We assume that $$ x_{is} = \int_{(s^+-1)}^{s+} R_{it} \mathrm{d}t $$ where $s+$ denotes the right limit of day $s$. That is, the number of newly recovered individuals occurring on day $s$.
We model $S_{it}$, $E_{it}$, $I_{it}$, and $R_{it}$ using a set of ordinary differential equations
\begin{eqnarray} \frac{\mathrm{d}\vec{S}_t}{\mathrm{d}t} & = & -\vec{S}_t \odot \vec{\lambda}_t \ \frac{\mathrm{d}\vec{E}_t}{\mathrm{d}t} & = & \vec{S}_t \odot \vec{\lambda}_t - \alpha \vec{E}_t \ \frac{\mathrm{d}\vec{I}_t}{\mathrm{d}t} & = & \alpha \vec{E}_t - \gamma \vec{I}_t \ \frac{\mathrm{d}\vec{R}_t}{\mathrm{d}t} & = & \gamma \vec{I}_t \end{eqnarray} where $\vec{S}t = (S{1t},\dots,S_{nt})^T$ and likewise for $\vec{E}t$, $\vec{I}_t$, and $\vec{R}_t$. Furthermore, $$ \vec{\lambda}_t = \beta \left(\vec{I}/\vec{N} + (K \cdot (\vec{I_t}/\vec{N}))/\vec{N}\right) $$ the force of infection vector for each city in China. $\beta$ is the human-human transmission rate, $K$ is a $n \times n$ matrix where the $k{ij},\; i \ne j$ element is the mean daily number of passengers flying between city $i$ and $j$, with diagonal elements $k_{ii} = 1, \; i=1,\dots n$. $\vec{N}$ is a vector of length $n$ containing the population size of each city in China.
Let the number of imported cases of 2019-nCoV in $m$ countries other than China be $z_js$ for $j=1,\dots,m$ where $$z_{jt} \sim \mbox{Poisson}(\nu_{jt})$$ and $$\nu_{jt} = \frac{W_{j\cdot} (\vec{\phi}\odot\vec{I}t)}{\vec{N}}$$ where $W$ is a $m \times n$ matrix with $w{ji}$th element being the mean daily number of passengers flying from Chinese city $i$ to non-Chinese city $j$.
We perform parameter inference on $\vec{\theta} = (\beta, \phi, \gamma, I_{Wuhan,0})^T$ using Maximum Likelihood Estimation, with log likelihood function \begin{equation} \label{eq:lik} \ell \left(\vec{y}, \vec{z} ; \vec{\theta}\right) \propto \sum_{t=1}^{T} \left[ \sum_{i=1}^{n} \left{ y_{it}\log(\phi_{i}x_{it}) -\phi_{i}\lambda_{it} \right} + \sum_{j=1}^{m} \left{z_{jt}\log\nu_{jt} - \nu_{jt}\right} \right]. \end{equation}
In practice, our data present a cumulative case count beginning on 11th January 2020. Since the likelihood in Equation \ref{eq:lik} assumes that number of new cases are available throughout the epidemic, we make the assumption that the number of cases reported on 11th January 2020 are the sum of Poisson-distributed independent draws on each day in the interval [0, 11]. We therefore modify our likelihood function as $$\ell \left(\vec{y}, \vec{z} ; \vec{\theta}\right) \propto y_{11t} \log(\phi_i \sum_{t=1}^{11} x_{it}) + \sum_{t=12}^{T} \left[ \sum_{i=1}^{n} \left{ y_{it}\log(\phi_{i}x_{it}) -\phi_{i}\lambda_{it} \right} + \sum_{j=1}^{m} \left{z_{jt}\log\nu_{jt} - \nu_{jt}\right} \right]. $$
Since the support of our parameters is bounded ($\beta, \gamma, I_{Wuhan,0} > 0,\; \phi > 0$), inference is made on the log scale using the standard Nelder-Mead optimisation routine implemented in R v3.6.1. 95\% confidence intervals are calculated assuming asymptotic Normality on the log-scale and transformed onto the linear scale for interpretation purposes.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.