phybreak: Create a phybreak-object from data and prior distributions.

Description Usage Arguments Value Author(s) References Examples

View source: R/phybreak.R

Description

phybreak takes as data either an 'obkData'-object or a phybreakdata-object with sequences (individuals in rows, nucleotides in columns). Both 'obkData' and phybreakdata contain at least sequences and sampling times, and potentially more. Parameter values are used as initial values in the MCMC-chain or kept fixed. All variables are initialized by random samples from the prior distribution, unless a complete tree is given in the data and should be used (use.tree = TRUE). It is also possible to provide only sequences as data, and sampling times separately.

Usage

1
2
3
4
5
6
phybreak(dataset, times = NULL, mu = NULL, gen.shape = 3, gen.mean = 1,
  sample.shape = 3, sample.mean = 1, wh.model = 3, wh.slope = 1,
  est.gen.mean = TRUE, prior.mean.gen.mean = 1, prior.mean.gen.sd = Inf,
  est.sample.mean = TRUE, prior.mean.sample.mean = 1,
  prior.mean.sample.sd = Inf, est.wh.slope = TRUE, prior.wh.shape = 3,
  prior.wh.mean = 1, use.tree = FALSE)

Arguments

dataset

An object with sequences plus additional data. (class 'obkData' or 'phybreakdata'). All nucleotides that are not 'a', 'c', 'g', or 't', will be turned into 'n'.

If the data are provided as an object of class 'obkData', these should contain sequences and sampling times as metadata with these sequences. The object may also contain infector and infection date vectors in the individuals slot, plus (at least) one tree in the 'trees' slot (class 'multiPhylo').

Data provided as an object of class 'phybreakdata' contain sequences and sampling.times, and potentially sim.infection.times, sim.infectors, and sim.tree. Prepare your data in this format by phybreakdata or by simulation with sim.phybreak.

It is also possible to provide only sequences as data, (class 'DNAbin', 'phyDat', or a matrix with nucleotides, each row a host, each column a nucleotide), and corresponding sampling times in the separate times argument.

times

Vector of sampling times, needed if the data consist of only sequences. If the vector is named, these names will be used to identify the hosts.

mu

Initial value for mutation rate (defined per site per unit of time). If NULL (default), then an initial value is calculated by dividing the number of SNPs by the product: 0.75 times 'total sequence length' times 'sum of edge lengths in the initial phylogenetic tree'. NOTE: mutation is defined as assignment of a random nucleotide at a particular site; this could be the nucleotide that was there before the mutation event. Therefore, the actual rate of change of nucleotides is 0.75*mu.

gen.shape

Shape parameter of the generation interval distribution (not estimated).

gen.mean

Initial value for the mean generation interval, i.e. the interval between infection of a secondary case by a primary case.

sample.shape

Shape parameter of the sampling interval distribution (not estimated), i.e. the interval between infection and sampling of a host.

sample.mean

Initial value for the mean sampling interval.

wh.model

The model for within-host pathogen dynamics (effective pathogen population size = N*gE = actual population size * pathogen generation time), used to simulate coalescence events. Options are:

  1. Effective size = 0, so coalescence occurs 'just before' transmission in the infector

  2. Effective size = Inf, so coalescence occurs 'just after' transmission in the infectee

  3. Effective size at time t after infection = wh.slope * t

wh.slope

Initial value for the within-host slope, used if wh.model = 3.

est.gen.mean

Whether to estimate the mean generation interval or keep it fixed.

prior.mean.gen.mean

Mean of the (gamma) prior distribution of mean generation interval mG (only if est.gen.mean = TRUE).

prior.mean.gen.sd

Standard deviation of the (gamma) prior distribution of mean generation interval mG (only if est.gen.mean = TRUE).

est.sample.mean

Whether to estimate the mean sampling interval or keep it fixed.

prior.mean.sample.mean

Mean of the (gamma) prior distribution of mean sampling interval mS (only if est.sample.mean = TRUE).

prior.mean.sample.sd

Standard deviation of the (gamma) prior distribution of mean sampling interval mS (only if est.sample.mean = TRUE).

est.wh.slope

Whether to estimate the within-host slope or keep it fixed.

prior.wh.shape

Shape parameter of the (gamma) prior distribution of slope (only if est.wh.slope = TRUE).

prior.wh.mean

Mean of the (gamma) prior distribution of slope (only if est.wh.slope = TRUE).

use.tree

Whether to use the transmission and phylogenetic tree given in data of class 'obkData', to create a phybreak-object with an exact copy of the outbreak. This requires more data in data: the slot individuals with vectors infector and date, and the slot trees with at least one phylogenetic tree. Such data can be simulated with sim.phybreak.

Value

An object of class phybreak with the following elements

d

a list with data, i.e. names, sequences, sampling times, and total number of SNPs.

v

a list with current state of all nodes in the tree: times, hosts in which they reside, parent nodes, node types (sampling, coalescent, or transmission)

p

a list with the parameter values

h

a list with helper information for the MCMC-method: si.mu and si.wh for efficiently proposing mu and slope, matrix dist with weights for infector sampling based on sequence distances, logicals est.mG, est.mS, and est.wh.slope whether to estimate mean generation interval mG, mean sampling interval mS, and within-host slope, and parameters for the priors of mG, mS, and slope.

s

an empty list that will contain vector and matrices with the posterior samples; in matrices, the rows are nodes in the phylogenetic tree, the columns are the samples

Author(s)

Don Klinkenberg don@xs4all.nl

References

Klinkenberg et al. (2017) Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks. PLoS Comput Biol, 13(5): e1005495.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
simulation <- sim.phybreak(obsize = 10)
MCMCstate <- phybreak(data = simulation)

simulation <- sim.phybreak(obsize = 10)
MCMCstate <- phybreak(data = simulation, use.tree = TRUE)


sampletimedata <- c(0,2,2,4,4)
sampleSNPdata <- matrix(c("a","a","a","a","a",
                          "a","c","c","c","c",
                          "t","t","t","g","g"), nrow = 5)
dataset <- phybreakdata(sequences = sampleSNPdata, sample.times = sampletimedata)
MCMCstate <- phybreak(data = dataset)

### also possible without 'phybreakdata' as intermediate, 
### but not with additional data (future implementation)
MCMCstate <- phybreak(data = sampleSNPdata, times = sampletimedata)

phybreak documentation built on May 2, 2019, 3:36 p.m.