outbreaker: Outbreaker: disease outbreak reconstruction using genetic...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

outbreaker is a tool for the reconstruction of disease outbreaks using pathogens genome sequences. It relies on a probabilistic model of disease transmission which takes the genetic diversity, collection dates, duration of pathogen colonization and time interval between cases into account. It is embedded in a Bayesian framework which allows to estimate the distributions of parameters of interest. It currently allows to estimate:

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
outbreaker(dna = NULL, dates, idx.dna = NULL, mut.model = 1,
  spa.model = 0, w.dens, f.dens = w.dens, dist.mat = NULL,
  init.tree = c("seqTrack", "random", "star"), init.kappa = NULL,
  init.mu1 = NULL, init.mu2 = init.mu1, init.spa1 = NULL,
  n.iter = 1e+05, sample.every = 500, tune.every = 500, burnin = 20000,
  import.method = c("genetic", "full", "none"), find.import.n = 50,
  pi.prior1 = 10, pi.prior2 = 1, spa1.prior = 1, move.mut = TRUE,
  move.ances = TRUE, move.kappa = TRUE, move.Tinf = TRUE,
  move.pi = TRUE, move.spa = TRUE, outlier.threshold = 5,
  max.kappa = 10, quiet = TRUE, res.file.name = "chains.txt",
  tune.file.name = "tuning.txt", seed = NULL)

outbreaker.parallel(n.runs, parallel = TRUE, n.cores = NULL, dna = NULL,
  dates, idx.dna = NULL, mut.model = 1, spa.model = 0, w.dens,
  f.dens = w.dens, dist.mat = NULL, init.tree = c("seqTrack", "random",
  "star"), init.kappa = NULL, init.mu1 = NULL, init.mu2 = init.mu1,
  init.spa1 = NULL, n.iter = 1e+05, sample.every = 500,
  tune.every = 500, burnin = 20000, import.method = c("genetic", "full",
  "none"), find.import.n = 50, pi.prior1 = 10, pi.prior2 = 1,
  spa1.prior = 1, move.mut = TRUE, move.ances = TRUE, move.kappa = TRUE,
  move.Tinf = TRUE, move.pi = TRUE, move.spa = TRUE,
  outlier.threshold = 5, max.kappa = 10, quiet = TRUE,
  res.file.name = "chains.txt", tune.file.name = "tuning.txt",
  seed = NULL)

Arguments

dna

the DNA sequences in DNAbin format (see read.dna in the ape package); this can be imported from a fasta file (extension .fa, .fas, or .fasta) using adegenet's function fasta2DNAbin; alternatively, a matrix of single characters strings, in which case only the mutation model 1 is available.

dates

a vector indicating the collection dates, provided either as integer numbers or in a usual date format such as Date or POSIXct format. By convention, zero will indicate the oldest date.

idx.dna

an optional integer vector indicating to which case each dna sequence in dna corresponds. Not required if each case has a sequence, and the order of the sequences matches that of the cases.

mut.model

an integer indicating the mutational model to be used; 1: one single mutation rate; 2: two rates, transitions (mu1) / transversions (mu2); if 'dna' is a sequence of character strings (not a DNAbin object), only the model 1 is available.

spa.model

an integer indicating the spatial model to be used. 0: no spatial model (default). 1: exponential kernel (under development).

w.dens

a vector of numeric values indicating the generation time distribution, reflecting the infectious potential of a case t=0, 1, 2, ... time steps after infection. By convention, w.dens[1]=0, meaning that an newly infected patient cannot be instantaneously infectious. If not standardized, this distribution is rescaled to sum to 1.

f.dens

similar to w.dens, except that this is the distribution of the colonization time, i.e. time interval during which the pathogen can be sampled from the patient.

dist.mat

a matrix of pairwise spatial distances between the cases.

init.tree

the tree used to initialize the MCMC. Can be either a character string indicating how this tree should be computed, or a vector of integers corresponding to the tree itself, where the i-th value corresponds to the index of the ancestor of 'i' (i.e., init.tree[i] is the ancestor of case i). Accepted character strings are "seqTrack" (uses seqTrack output as initialize tree), "random" (ancestor randomly selected from preceding cases), and "star" (all cases coalesce to the first case). Note that for SeqTrack, all cases should have been sequenced.

init.kappa

as init.tree, but values indicate the number of generations between each case and its most recent sampled ancestor.

init.mu1, init.mu2

initial values for the mutation rates (mu1: transitions; mu2: transversions).

init.spa1

initial values of the spatial parameter.

n.iter

an integer indicating the number of iterations in the MCMC, including the burnin period; defaults to 100,000.

sample.every

an integer indicating the frequency at which to sample from the MCMC, defaulting to 500 (i.e., output to file every 500 iterations).

tune.every

an integer indicating the frequency at which proposal distributions are tuned, defaulting to 500 (i.e., tune proposal distribution every 500 iterations).

burnin

an integer indicating the number of iterations for the burnin period, after which the chains are supposed to have mixed; estimated values of parameter are only relevant after the burnin period. Used only when imported cases are automatically detected.

import.method

a character string indicating which method to use for detecting imported cases; available choices are 'gen' (based on genetic likelihood), 'full' (based on full likelihood), and 'none' (no imported case detection).

find.import.n

an integer indicating how many chains should be used to determine imported cases; note that this corresponds to chains that are output after the burnin, so that a total of (burnin + output.every*find.import.n) chains will be used in the prior run to determine imported cases. Defaults to 50.

pi.prior1, pi.prior2

two numeric values being the parameters of the Beta distribution used as a prior for π. This prior is Beta(10,1) by default, indicating that a majority of cases are likely to have been observed. Use Beta(1,1) for a flat prior.

spa1.prior

parameters of the prior distribution for the spatial parameters. In the spatial model 1, spa1.prior is the mean of an exponential distribution.

move.mut, move.pi, move.spa

logicals indicating whether the named items should be estimated ('moved' in the MCMC), or not, all defaulting to TRUE. move.mut handles both mutation rates.

move.ances, move.kappa, move.Tinf

vectors of logicals of length 'n' indicating for which cases different components should be moved during the MCMC.

outlier.threshold

a numeric value indicating the threshold for detecting low likelihood values corresponding to imported cases. Outliers have a likelihood outlier.threshold smaller than the average.

max.kappa

an integer indicating the maximum number of generations between a case and its most recent sampled ancestor; defaults to 10.

quiet

a logical indicating whether messages should be displayed on the screen.

res.file.name

a character string indicating the name of the file used to store MCMC outputs.

tune.file.name

a character string indicating the name of the file used to store MCMC tuning outputs.

seed

an integer used to set the random seed of the C procedures.

n.runs

an integer indicating the number of independent chains to run, either in parallel (if parallel is used), or serially (otherwise).

parallel

a logical indicating whether the package parallel should be used to run parallelized computations; by default, it is used if available.

n.cores

an integer indicating the number of cores to be used for parallelized computations; if NULL (default value), then up to 6 cores are used, depending on availability.

Details

The function outbreaker is the basic implementation of the model. outbreaker.parallel allows to run several independent MCMC in parallel across different cores / processors of the same computer. This requires the base package parallel.

The spatial module implemented in outbreaker is currently under development. Please contact the author before using it.

For more resources including tutorials, forums, etc., see: http://sites.google.com/site/therepiproject/r-pac/outbreaker

Value

Both procedures return a list with the following components:

Author(s)

Thibaut Jombart (t.jombart@imperial.ac.uk)

References

Jombart T, Cori A, Didelot X, Cauchemez S, Fraser C and Ferguson N (accepted). Bayesian reconstruction of disease outbreaks by combining epidemiologic and genomic data. PLoS Computational Biology.

See Also

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
## EXAMPLE USING TOYOUTBREAK ##
## LOAD DATA, SET RANDOM SEED
data(fakeOutbreak)
attach(fakeOutbreak)

## VISUALIZE DYNAMICS
matplot(dat$dynam, type="o", pch=20, lty=1,
   main="Outbreak dynamics", xlim=c(0,28))
legend("topright", legend=c("S","I","R"), lty=1, col=1:3)

## VISUALIZE TRANSMISSION TREE
plot(dat, annot="dist", main="Data - transmission tree")
mtext(side=3, "arrow annotations are numbers of mutations")


## Not run: 
## RUN OUTBREAKER - PARALLEL VERSION
## (takes < 1 min))
set.seed(1)
res <-  outbreaker.parallel(n.runs=4, dna=dat$dna,
   dates=collecDates,w.dens=w, n.iter=5e4)

## End(Not run)


## ASSESS CONVERGENCE OF CHAINS
plotChains(res)
plotChains(res, burnin=2e4)

## REPRESENT POSTERIOR ANCESTRIES
transGraph(res, annot="", main="Posterior ancestries", thres=.01)

## GET CONSENSUS ANCESTRIES
tre <- get.tTree(res)
plot(tre, annot="", main="Consensus ancestries")

## SHOW DISCREPANCIES
col <- rep("lightgrey", 30)
col[which(dat$ances != tre$ances)] <- "pink"
plot(tre, annot="", vertex.color=col, main="Consensus ancestries")
mtext(side=3, text="cases with erroneous ancestries in pink")

## GET EFFECTIVE REPRODUCTION OVER TIME
get.Rt(res)

## GET INDIVIDUAL EFFECTIVE REPRODUCTION
head(get.R(res))
boxplot(get.R(res), col="grey", xlab="Case",
        ylab="Effective reproduction number")

## GET MUTATION RATE PER TIME UNIT
## per genome
head(get.mu(res))

## per nucleotide
mu <- get.mu(res, genome.size=1e4)
head(mu)

summary(mu)
hist(mu, border="lightgrey", col="grey", xlab="Mutation per day and nucleotide",
     main="Posterior distribution of mutation rate")

detach(fakeOutbreak)

thibautjombart/outbreaker documentation built on May 31, 2019, 9:57 a.m.