Milestone Description

Increased complexity of the simple model to include data from Milestone 5. Automated testing and validating procedures implemented where possible. Procedures for multiple models comparison and accounting for uncertainty established.

Summary of Milestone 4

In Milestone 4, we presented a simple transmission model that made us of historical case counts, tranmissibility of the pathogen and geographical characetrisation as a sole means to predict future risk. The number of cases at a location $j$ at time $t$ is given by the equation [ I_{j, t} \sim Pois\left( \sum_{i = 1}^{n} {\left( p_{i \rightarrow j} R_{t, i} \sum_{s = 1}^{t}{I_{i, t - s} w_{s}}\right)} \right), ]

where $R_{t, i}$ is the reproduction number at location $i$ at time $t$ and $p_{i \rightarrow j}$ is the probability of moving from location $i$ to location $j$. The probability of moving between locations is derived from the relative flow of populations between locations. This latter quantity is estimated using a population flow model. In the previous iteration of our work, we had used gravity model. Under this model, the flow of individuals from area $i$ to area $j$, $\phi_{i \rightarrow j}$, is proportional to the product of the populations of the two areas, $N_i$ and $N_j$, and inversely proportional to the distance between them $d_{i, j}$ raised to some power. That is, [ p_{i \rightarrow j} = (1 - p_{stay}^i) r_{i \rightarrow j}^{spread},]

where $p_{stay}^i$ is the probability of staying at location $i$ and $r_{i \rightarrow j}^{spread}$ is the relative risk of spread to a location $j$ from a focal location $i$. $r_{i \rightarrow j}^{spread}$ can be quantified as [ r_{i \rightarrow j}^{spread} = \frac{\phi_{i \rightarrow j}}{\sum_{x}{\phi_{i \rightarrow j}}}. ]

Changes in Milestone 6

Workflow

The pre-processing steps for data obrained from HealthMap/ProMed consist of: 1. Restricting the full data set to the locations of interest; 2. For days with multiple alerts, merge the alerts into a single alert; 3. Discard inconsistent data i.e., that which lead to a decreasing cumulative counts; 4. Interpolating for dates for which data are missing so that the cumulative incidence count is available for every day in the interval of interest and retrieve incidence data from interpolated cumulative incidence data.

In addition to the above, a step was added to remove outliers from the data provided using Chebyshev Inequality with sample mean. Figure shows the HealthMap data for Liberia through each pre-processing step.

Model training and validation using data from WHO

In the current iteration, the model was trained and validated the data on cases officially reported to the WHO during the 2013–2016 Ebola outbreak in Guinea, Liberia and Sierra Leone. This dataset was cleaned and published in [@garske20160308] and it is this cleaned version of the data that were used in this work. This dataset consists of incidence reports at ADM2 level. Thus in using it, we were able to better validate the model. We refer to this dataset as WHO data throughout the rest of this document.

Output for Ebola in West Africa

In this section, we provide results from each step in the workflow described in Section.

Incidence data clean-up

Since the WHO data used was a clean version, pre-processing was not needed. The pre-processing steps are illustrated using data from HealthMap.

Estimating Transmissibility

Predicting Future Cases

Connectivity and Risk of International Spread

Conclusions and Next Steps

References



annecori/mRIIDSprocessData documentation built on May 29, 2019, 1:16 p.m.