In chrism0dwk/wuhan: Model simulation and fitting for Wuhan nCoV-2019

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Package usage

This R package implements an ODE-based model of the novel coronavirus outbreak in Wuhan, China. It presents a simulator and likelihood function assuming Poisson-distributed increments in the number of new cases in Wuhan, in the rest of China via the airline network, and to the rest of the world. The model is described in detail below.

Data required:

china_cases daily case reports in all Chinese cities (see data(package='wuhan'))
world_cases daily case reports from other countries (see data(package='wuhan'))
K daily numbers of passengers going between cities in China via airline network, available from OAG Traffic Analyzer
W daily numbers of passengers going between Chinese cities and other countries via airline network, available from OAG Traffic Analyzer
china_population the population size in each Chinese city (see data(package='wuhan'))

Parameters:

beta the human-human basic transmission rate
gamma the removal rate (inverse of infectious period)
I0W the number of initial infectives in Wuhan
phi the case ascertainment rate in Wuhan

To use the package, assume the following workflow in R:

# Load required packages
> install.packages('devtools')
> devtools::install_git('https://github.com/chrism0dwk/wuhan.git')
> library(wuhan)

# Instantiate ODE model, simulate up to day 22.
> simulator = NetworkODEModel(N=china_population, K=K, init_loc='Wuhan', alpha=1/4, max_t=22) 

# Instantiate LogLikelihood function
> llik = LogLikelihood(y=china_cases[,1:22], z=world_cases[,1:22], N=N, K=K, W=W, sim_fun=simulator)

# Find MLEs using optimisation
> par_init = c(0.4, 0.142857142857143, 1, 0.5)  # Starting point
> fit = optim(log(par_init), llik, control=list(fnscale=-1))
> p_hat = fit$par

Asymptotic assumptions for confidence intervals fail in our case, since the parameter space is highly non-orthogonal. Confidence intervals are therefore calculated using parametric bootstrap. p_hat is calculated on the log scale (logit scale for the phi parameter), so needs to be transformed first:

> p_hat[1:3] = exp(p_hat[1:3])
> p_hat[4] = invlogit(p_hat[4])

The samples can then be drawn by bootstrap, for which a computing cluster is highly recommended (thanks Lancaster University HEC facility!).

> samples = bootstrap(p_hat, K, W, alpha=1/4, max_t=22, n_samples=1000)

Since the airline connectivity matrices are not included in this package, samples from the parameters (for 4 different values of the latent period $1/\alpha$) are provided as in-build datasets. See data(package='wuhan').

Within China Model

Let $y_{it}$ be the number of new case detections of 2019-nCoV in Chinese city $i=1,\dots,n$ modelled as

$$ y_{is} \sim \mbox{Poisson}(\phi_i x_{is})$$

where $x_{it}$ is the expected number of new case detections in city $i$ on day $s$. $0 \leq \phi_i \leq 1$ is a city-specific case ascertainment probability. In the case of China, we assume that

$$ \phi_i = \begin{cases} \phi & \mbox{if } i = \mbox{`Wuhan'} \ 1 & \mbox{otherwise} \end{cases}$$

Within China, we model an individual's trajectory through infection as susceptible, exposed (and not infectious), infected (and infectious), and finally removed. In continuous time, let $S_{it}$, $E_{it}$, $I_{it}$, and $R_{it}$ be the numbers of susceptible, exposed, infected, and removed individuals respectively in city $i$ at time $t$. We assume that $$ x_{is} = \int_{(s^+-1)}^{s+} R_{it} \mathrm{d}t $$ where $s+$ denotes the right limit of day $s$. That is, the number of newly recovered individuals occurring on day $s$.

We model $S_{it}$, $E_{it}$, $I_{it}$, and $R_{it}$ using a set of ordinary differential equations

\begin{eqnarray} \frac{\mathrm{d}\vec{S}_t}{\mathrm{d}t} & = & -\vec{S}_t \odot \vec{\lambda}_t \ \frac{\mathrm{d}\vec{E}_t}{\mathrm{d}t} & = & \vec{S}_t \odot \vec{\lambda}_t - \alpha \vec{E}_t \ \frac{\mathrm{d}\vec{I}_t}{\mathrm{d}t} & = & \alpha \vec{E}_t - \gamma \vec{I}_t \ \frac{\mathrm{d}\vec{R}_t}{\mathrm{d}t} & = & \gamma \vec{I}_t \end{eqnarray} where $\vec{S}t = (S{1t},\dots,S_{nt})^T$ and likewise for $\vec{E}t$, $\vec{I}_t$, and $\vec{R}_t$. Furthermore, $$ \vec{\lambda}_t = \beta \left(\vec{I}/\vec{N} + (K \cdot (\vec{I_t}/\vec{N}))/\vec{N}\right) $$ the force of infection vector for each city in China. $\beta$ is the human-human transmission rate, $K$ is a $n \times n$ matrix where the $k{ij},\; i \ne j$ element is the mean daily number of passengers flying between city $i$ and $j$, with diagonal elements $k_{ii} = 1, \; i=1,\dots n$. $\vec{N}$ is a vector of length $n$ containing the population size of each city in China.

Non-China Country-level Model

Let the number of imported cases of 2019-nCoV in $m$ countries other than China be $z_js$ for $j=1,\dots,m$ where $$z_{jt} \sim \mbox{Poisson}(\nu_{jt})$$ and $$\nu_{jt} = \frac{W_{j\cdot} (\vec{\phi}\odot\vec{I}t)}{\vec{N}}$$ where $W$ is a $m \times n$ matrix with $w{ji}$th element being the mean daily number of passengers flying from Chinese city $i$ to non-Chinese city $j$.

Inference

We perform parameter inference on $\vec{\theta} = (\beta, \phi, \gamma, I_{Wuhan,0})^T$ using Maximum Likelihood Estimation, with log likelihood function \begin{equation} \label{eq:lik} \ell \left(\vec{y}, \vec{z} ; \vec{\theta}\right) \propto \sum_{t=1}^{T} \left[ \sum_{i=1}^{n} \left{ y_{it}\log(\phi_{i}x_{it}) -\phi_{i}\lambda_{it} \right} + \sum_{j=1}^{m} \left{z_{jt}\log\nu_{jt} - \nu_{jt}\right} \right]. \end{equation}

In practice, our data present a cumulative case count beginning on 11th January 2020. Since the likelihood in Equation \ref{eq:lik} assumes that number of new cases are available throughout the epidemic, we make the assumption that the number of cases reported on 11th January 2020 are the sum of Poisson-distributed independent draws on each day in the interval [0, 11]. We therefore modify our likelihood function as $$\ell \left(\vec{y}, \vec{z} ; \vec{\theta}\right) \propto y_{11t} \log(\phi_i \sum_{t=1}^{11} x_{it}) + \sum_{t=12}^{T} \left[ \sum_{i=1}^{n} \left{ y_{it}\log(\phi_{i}x_{it}) -\phi_{i}\lambda_{it} \right} + \sum_{j=1}^{m} \left{z_{jt}\log\nu_{jt} - \nu_{jt}\right} \right]. $$

Since the support of our parameters is bounded ($\beta, \gamma, I_{Wuhan,0} > 0,\; \phi > 0$), inference is made on the log scale using the standard Nelder-Mead optimisation routine implemented in R v3.6.1. 95\% confidence intervals are calculated assuming asymptotic Normality on the log-scale and transformed onto the linear scale for interpretation purposes.

chrism0dwk/wuhan documentation built on Feb. 16, 2020, 6:01 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com