This Vignette shows a case study of Ebola data from Sierra Leone during the 2014 outbreak, using the algorithm in LNAPhylodyn package.

library package and dependencies

First, we need to library the package and some dependent packages.

library(LNAPhyloDyn)
library(phylodyn)
library(ape)
library(MASS)

Ebola genealogy

Then we load the dataset from LNAPhylodyn package. The dataset is a genealogy estimated from 1010 sequences in 15 cities in Sierra Leone during the 2014 ebola outbreak. The dataset is a phylo object with a tree structure. To parse the data into sampling time, number of sampled lineages and coalescent times, we need to call function summarize_phylo from phylodyn package by Karcher. et al (2016).

load Ebola genealogy from 2014 outbreak

data(Ebola_Sierra_Leone2014) # load dataset
par(mar = c(4,4,1,1), mgp = c(2,1,0)) # plot the genealogy
a = plot(Ebola_Sierra_Leone2014, show.tip.label=FALSE,x.lim=c(0,1.4),use.edge.length = T)
axisPhylo(1,root.time = 0,backward = F,xaxt = "n",las = 1)
axis(side = 1,at = seq(0,1.5,0.25),
     labels = c('2014-5','2014-8','2014-11','2015-2','2015-5','2015-8','2015-11'))
# parse the phylo oject into sampling time, number of sampled lineages and coalescent times
names(summarize_phylo(Ebola_Sierra_Leone2014))

Inference

Setup total population size

The total population size is $N=7000000$, which is approximately the total population size in Sierra Leone in 2014.

Setup time grid

To run the model, firstly we need to specify the time grid for the ode solver, the LNA grid and the grid for change points. The total length out the time period is 1.36 (year).

times = seq(0,1.36,length.out = 2001)
times2 = times[seq(1,length(times), by = 50)]
chtimes = times2[2 : (length(times2) - 1)]

Setup prior distribution for each parameters

Specify other configurations:

There are other parameters, which you can keep as default.

Running MCMC

res.SIR_ADPk2 = General_MCMC_with_ESlice(summarize_phylo(Ebola_Sierra_Leone2014), times = times, t_correct = 1.36,
                                   N = 7000000,gridsize = 50,niter = 500,burn = 0,thin = 5,
                                   changetime = chtimes, DEMS = c("S","I"),
                                   prior=list(pop_pr=c(1,0.5,1,1), R0_pr=c(0.7,0.5), gamma_pr = c(3,0.1), mu_pr = c(3,0.15), hyper_pr=c(3,0.2)),
                                   proposal = list(pop_prop = 0.2, R0_prop = c(0.05), gamma_prop=0.09, mu_prop = 0.1, hyper_prop=0.2),
                                   control = list(alpha=c(1/7000000), R0 = 2, gamma=30, ch=rep(0.975,length(chtimes)), traj = matrix(rep(0,2*40),ncol=2),hyper = 20),ESS_vec = c(0,1,0,0,1,1,0),
                                   likelihood = "volz", model = "SIR",
                                   Index = c(0,1), nparam=2,method = "admcmc", options=list(joint = T, 
                                   PCOV = NULL,beta=0.95,burn1 = 300, parIdlist = list(a=1,c=3,d = length(chtimes) + 5), 
                                   isLoglist = list(a=1,c=1,d=0),
                                   priorIdlist = list(a=1,c=3,d=5), up = 1000000,warmup =10000000, tune = 0.01,hyper = F),
                                   verbose = F)

reference

  1. Dudas, G., Carvalho, L. M., Bedford, T., Tatem, A. J., Baele, G., Faria, N. R., ... & Bielejec, F. (2017). Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature, 544(7650), 309.


MingweiWilliamTang/LNAphyloDyn documentation built on Oct. 23, 2019, 11:56 a.m.