Description Usage Arguments Details Value Author(s) References See Also Examples
outbreaker
is a tool for the reconstruction of disease outbreaks
using pathogens genome sequences. It relies on a probabilistic model of
disease transmission which takes the genetic diversity, collection dates,
duration of pathogen colonization and time interval between cases into
account. It is embedded in a Bayesian framework which allows to estimate the
distributions of parameters of interest. It currently allows to estimate:
transmission trees
dates of infection
missing cases in a chain of transmission
mutation rates
imported cases
(indirectly) effective reproduction numbers
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | outbreaker(dna = NULL, dates, idx.dna = NULL, mut.model = 1,
spa.model = 0, w.dens, f.dens = w.dens, dist.mat = NULL,
init.tree = c("seqTrack", "random", "star"), init.kappa = NULL,
init.mu1 = NULL, init.mu2 = init.mu1, init.spa1 = NULL,
n.iter = 1e+05, sample.every = 500, tune.every = 500, burnin = 20000,
import.method = c("genetic", "full", "none"), find.import.n = 50,
pi.prior1 = 10, pi.prior2 = 1, spa1.prior = 1, move.mut = TRUE,
move.ances = TRUE, move.kappa = TRUE, move.Tinf = TRUE,
move.pi = TRUE, move.spa = TRUE, outlier.threshold = 5,
max.kappa = 10, quiet = TRUE, res.file.name = "chains.txt",
tune.file.name = "tuning.txt", seed = NULL)
outbreaker.parallel(n.runs, parallel = TRUE, n.cores = NULL, dna = NULL,
dates, idx.dna = NULL, mut.model = 1, spa.model = 0, w.dens,
f.dens = w.dens, dist.mat = NULL, init.tree = c("seqTrack", "random",
"star"), init.kappa = NULL, init.mu1 = NULL, init.mu2 = init.mu1,
init.spa1 = NULL, n.iter = 1e+05, sample.every = 500,
tune.every = 500, burnin = 20000, import.method = c("genetic", "full",
"none"), find.import.n = 50, pi.prior1 = 10, pi.prior2 = 1,
spa1.prior = 1, move.mut = TRUE, move.ances = TRUE, move.kappa = TRUE,
move.Tinf = TRUE, move.pi = TRUE, move.spa = TRUE,
outlier.threshold = 5, max.kappa = 10, quiet = TRUE,
res.file.name = "chains.txt", tune.file.name = "tuning.txt",
seed = NULL)
|
dna |
the DNA sequences in |
dates |
a vector indicating the collection dates, provided either as
integer numbers or in a usual date format such as |
idx.dna |
an optional integer vector indicating to which case each dna
sequence in |
mut.model |
an integer indicating the mutational model to be used; 1: one single mutation rate; 2: two rates, transitions (mu1) / transversions (mu2); if 'dna' is a sequence of character strings (not a DNAbin object), only the model 1 is available. |
spa.model |
an integer indicating the spatial model to be used. 0: no spatial model (default). 1: exponential kernel (under development). |
w.dens |
a vector of numeric values indicating the generation time distribution, reflecting the infectious potential of a case t=0, 1, 2, ... time steps after infection. By convention, w.dens[1]=0, meaning that an newly infected patient cannot be instantaneously infectious. If not standardized, this distribution is rescaled to sum to 1. |
f.dens |
similar to |
dist.mat |
a matrix of pairwise spatial distances between the cases. |
init.tree |
the tree used to initialize the MCMC. Can be either a
character string indicating how this tree should be computed, or a vector of
integers corresponding to the tree itself, where the i-th value corresponds
to the index of the ancestor of 'i' (i.e., |
init.kappa |
as |
init.mu1, init.mu2 |
initial values for the mutation rates (mu1: transitions; mu2: transversions). |
init.spa1 |
initial values of the spatial parameter. |
n.iter |
an integer indicating the number of iterations in the MCMC,
including the burnin period; defaults to |
sample.every |
an integer indicating the frequency at which to sample from the MCMC, defaulting to 500 (i.e., output to file every 500 iterations). |
tune.every |
an integer indicating the frequency at which proposal distributions are tuned, defaulting to 500 (i.e., tune proposal distribution every 500 iterations). |
burnin |
an integer indicating the number of iterations for the burnin period, after which the chains are supposed to have mixed; estimated values of parameter are only relevant after the burnin period. Used only when imported cases are automatically detected. |
import.method |
a character string indicating which method to use for detecting imported cases; available choices are 'gen' (based on genetic likelihood), 'full' (based on full likelihood), and 'none' (no imported case detection). |
find.import.n |
an integer indicating how many chains should be used to
determine imported cases; note that this corresponds to chains that are
output after the burnin, so that a total of (burnin +
output.every*find.import.n) chains will be used in the prior run to
determine imported cases. Defaults to |
pi.prior1, pi.prior2 |
two numeric values being the parameters of the Beta distribution used as a prior for π. This prior is Beta(10,1) by default, indicating that a majority of cases are likely to have been observed. Use Beta(1,1) for a flat prior. |
spa1.prior |
parameters of the prior distribution for the spatial
parameters. In the spatial model 1, |
move.mut, move.pi, move.spa |
logicals indicating whether the named items
should be estimated ('moved' in the MCMC), or not, all defaulting to TRUE.
|
move.ances, move.kappa, move.Tinf |
vectors of logicals of length 'n' indicating for which cases different components should be moved during the MCMC. |
outlier.threshold |
a numeric value indicating the threshold for
detecting low likelihood values corresponding to imported cases. Outliers
have a likelihood |
max.kappa |
an integer indicating the maximum number of generations between a case and its most recent sampled ancestor; defaults to 10. |
quiet |
a logical indicating whether messages should be displayed on the screen. |
res.file.name |
a character string indicating the name of the file used to store MCMC outputs. |
tune.file.name |
a character string indicating the name of the file used to store MCMC tuning outputs. |
seed |
an integer used to set the random seed of the C procedures. |
n.runs |
an integer indicating the number of independent chains to run,
either in parallel (if |
parallel |
a logical indicating whether the package |
n.cores |
an integer indicating the number of cores to be used for parallelized computations; if NULL (default value), then up to 6 cores are used, depending on availability. |
The function outbreaker
is the basic implementation of the model.
outbreaker.parallel
allows to run several independent MCMC in
parallel across different cores / processors of the same computer. This
requires the base package parallel
.
The spatial module implemented in outbreaker is currently under development. Please contact the author before using it.
For more resources including tutorials, forums, etc., see: http://sites.google.com/site/therepiproject/r-pac/outbreaker
Both procedures return a list with the following components:
chains: a data.frame containing MCMC outputs (which are also
stored in the file indicated in res.file.name
).
collec.dates: (data) the collection dates.
w: (data) the generation time distribution (argument w.dens
)
f: (data) the distribution of the time to collection (argument
f.dens
)
D: a matrix of genetic distances (in number of mutations) between all pairs of sequences.
idx.dna: (data) the index of the case each dna sequence corresponds to
tune.end: an integer indicating at which iteration the proposal auto-tuning procedures all stopped.
find.import: a logical indicating if imported cases were to be automatically detected.
burnin: an integer indicating the pre-defined burnin, used when detecting imported cases.
find.import.at: an integer indicating at which iteration of the preliminary MCMC imported cases were detected.
n.runs: the number of independent runs used.
call: the matched call.
Thibaut Jombart (t.jombart@imperial.ac.uk)
Jombart T, Cori A, Didelot X, Cauchemez S, Fraser C and Ferguson N (accepted). Bayesian reconstruction of disease outbreaks by combining epidemiologic and genomic data. PLoS Computational Biology.
plotChains to visualize MCMC chains.
transGraph and get.tTree to represent transmission trees.
get.R and get.Rt to get reproduction numbers distributions.
get.incid to get estimates of incidence.
get.mu to get the mutation rate distribution.
simOutbreak to simulate outbreaks.
selectChains to select chains from parallel runs which converged towards different posterior modes.
fakeOutbreak, a toy dataset used to illustrate the method.
For more resources including tutorials, forums, etc., see: http://sites.google.com/site/therepiproject/r-pac/outbreaker
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | ## EXAMPLE USING TOYOUTBREAK ##
## LOAD DATA, SET RANDOM SEED
data(fakeOutbreak)
attach(fakeOutbreak)
## VISUALIZE DYNAMICS
matplot(dat$dynam, type="o", pch=20, lty=1,
main="Outbreak dynamics", xlim=c(0,28))
legend("topright", legend=c("S","I","R"), lty=1, col=1:3)
## VISUALIZE TRANSMISSION TREE
plot(dat, annot="dist", main="Data - transmission tree")
mtext(side=3, "arrow annotations are numbers of mutations")
## Not run:
## RUN OUTBREAKER - PARALLEL VERSION
## (takes < 1 min))
set.seed(1)
res <- outbreaker.parallel(n.runs=4, dna=dat$dna,
dates=collecDates,w.dens=w, n.iter=5e4)
## End(Not run)
## ASSESS CONVERGENCE OF CHAINS
plotChains(res)
plotChains(res, burnin=2e4)
## REPRESENT POSTERIOR ANCESTRIES
transGraph(res, annot="", main="Posterior ancestries", thres=.01)
## GET CONSENSUS ANCESTRIES
tre <- get.tTree(res)
plot(tre, annot="", main="Consensus ancestries")
## SHOW DISCREPANCIES
col <- rep("lightgrey", 30)
col[which(dat$ances != tre$ances)] <- "pink"
plot(tre, annot="", vertex.color=col, main="Consensus ancestries")
mtext(side=3, text="cases with erroneous ancestries in pink")
## GET EFFECTIVE REPRODUCTION OVER TIME
get.Rt(res)
## GET INDIVIDUAL EFFECTIVE REPRODUCTION
head(get.R(res))
boxplot(get.R(res), col="grey", xlab="Case",
ylab="Effective reproduction number")
## GET MUTATION RATE PER TIME UNIT
## per genome
head(get.mu(res))
## per nucleotide
mu <- get.mu(res, genome.size=1e4)
head(mu)
summary(mu)
hist(mu, border="lightgrey", col="grey", xlab="Mutation per day and nucleotide",
main="Posterior distribution of mutation rate")
detach(fakeOutbreak)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.