phybreak: Create a phybreak-object from data and prior distributions.

Description Usage Arguments Value Author(s) References Examples

View source: R/phybreak.R

Description

phybreak takes as data a phybreakdata-object with sequences (individuals in rows, nucleotides in columns) and sampling times, and potentially more. Parameter values are used as initial values in the MCMC-chain or kept fixed. All variables are initialized by random samples from the prior distribution, unless a complete tree is given in the data and should be used (use.tree = TRUE). It is also possible to provide only sequences as data, and sampling times separately.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
phybreak(dataset, times = NULL, mu = NULL, gen.shape = 3,
  gen.mean = 1, sample.shape = 3, sample.mean = 1,
  wh.model = "linear", wh.bottleneck = "auto", wh.slope = 1,
  wh.exponent = 1, wh.level = 0.1, dist.model = "power",
  dist.exponent = 2, dist.scale = 1, dist.mean = 1,
  est.gen.mean = TRUE, prior.gen.mean.mean = 1,
  prior.gen.mean.sd = Inf, est.sample.mean = TRUE,
  prior.sample.mean.mean = 1, prior.sample.mean.sd = Inf,
  est.wh.slope = TRUE, prior.wh.slope.shape = 3,
  prior.wh.slope.mean = 1, est.wh.exponent = TRUE,
  prior.wh.exponent.shape = 1, prior.wh.exponent.mean = 1,
  est.wh.level = TRUE, prior.wh.level.shape = 1,
  prior.wh.level.mean = 0.1, est.dist.exponent = TRUE,
  prior.dist.exponent.shape = 1, prior.dist.exponent.mean = 1,
  est.dist.scale = TRUE, prior.dist.scale.shape = 1,
  prior.dist.scale.mean = 1, est.dist.mean = TRUE,
  prior.dist.mean.shape = 1, prior.dist.mean.mean = 1,
  use.tree = FALSE, ...)

Arguments

dataset

An object with sequences plus additional data (class 'phybreakdata'). Data contain sequences and sampling.times, and potentially sim.infection.times, sim.infectors, and sim.tree. Prepare your data in this format by phybreakdata or by simulation with sim.phybreak.

It is also possible to provide only sequences as data, (class 'DNAbin', 'phyDat', or a matrix with nucleotides, each row a host, each column a nucleotide), and corresponding sampling times in the separate times argument.

times

Vector of sampling times, needed if the data consist of only sequences. If the vector is named, these names will be used to identify the hosts.

mu

Initial value for mutation rate (defined per site per unit of time). If NULL (default), then an initial value is calculated by dividing the number of SNPs by the product: 0.75 times 'total sequence length' times 'sum of edge lengths in the initial phylogenetic tree'. NOTE: mutation is defined as assignment of a random nucleotide at a particular site; this could be the nucleotide that was there before the mutation event. Therefore, the actual rate of change of nucleotides is 0.75*mu.

gen.shape

Shape parameter of the generation interval distribution (not estimated).

gen.mean

Initial value for the mean generation interval, i.e. the interval between infection of a secondary case by a primary case.

sample.shape

Shape parameter of the sampling interval distribution (not estimated), i.e. the interval between infection and sampling of a host.

sample.mean

Initial value for the mean sampling interval.

wh.model

The model for within-host pathogen dynamics (effective pathogen population size = N*gE = actual population size * pathogen generation time), used to simulate coalescence events. Names and numbers are allowed. Options are:

  1. "single": effective size = 0, so coalescence occurs 'just before' transmission in the infector (complete bottleneck)

  2. "infinite": effective size = Inf, with complete bottleneck, so coalescence occurs 'just after' transmission in the infectee

  3. "linear": effective size at time t after infection = wh.level + wh.slope * t (complete or wide bottleneck; if complete, wh.level = 0)

  4. "exponential": effective size at time t after infection = wh.level * exp(wh.exponent * t) (wide bottleneck)

  5. "constant": effective size = wh.level (wide bottleneck)

wh.bottleneck

Whether the bottleneck should be complete or wide, which is only an option if wh.model = "linear" (in that case, "auto" defaults to "complete").

wh.slope

Initial value for the within-host slope, used if wh.model = "linear".

wh.exponent

Initial value for the within-host exponent, used if wh.model = "exponential"

wh.level

Initial value for the within-host effective pathogen size at transmission, used if wh.bottleneck = "wide" (if wh.model = "exponential" or "constant", and optional if wh.model = "linear")

est.gen.mean

Whether to estimate the mean generation interval or keep it fixed.

prior.gen.mean.mean

Mean of the (gamma) prior distribution of mean generation interval mG (only if est.gen.mean = TRUE).

prior.gen.mean.sd

Standard deviation of the (gamma) prior distribution of mean generation interval mG (only if est.gen.mean = TRUE).

est.sample.mean

Whether to estimate the mean sampling interval or keep it fixed.

prior.sample.mean.mean

Mean of the (gamma) prior distribution of mean sampling interval mS (only if est.sample.mean = TRUE).

prior.sample.mean.sd

Standard deviation of the (gamma) prior distribution of mean sampling interval mS (only if est.sample.mean = TRUE).

est.wh.slope

Whether to estimate the within-host slope or keep it fixed.

prior.wh.slope.shape

Shape parameter of the (gamma) prior distribution of wh.slope (only if est.wh.slope = TRUE).

prior.wh.slope.mean

Mean of the (gamma) prior distribution of wh.slope (only if est.wh.slope = TRUE).

est.wh.exponent

Whether to estimate the within-host exponent or keep it fixed.

prior.wh.exponent.shape

Shape parameter of the (gamma) prior distribution of wh.exponent (only if est.wh.exponent = TRUE).

prior.wh.exponent.mean

Mean of the (gamma) prior distribution of wh.exponent (only if est.wh.exponent = TRUE).

est.wh.level

Whether to estimate the within-host level at t = 0 or keep it fixed.

prior.wh.level.shape

Shape parameter of the (gamma) prior distribution of wh.level (only if est.wh.level = TRUE).

prior.wh.level.mean

Mean of the (gamma) prior distribution of wh.level (only if est.wh.level = TRUE).

use.tree

Whether to use the transmission and phylogenetic tree given in data of class 'obkData', to create a phybreak-object with an exact copy of the outbreak. This requires more data in data: the slot individuals with vectors infector and date, and the slot trees with at least one phylogenetic tree. Such data can be simulated with sim.phybreak.

...

If arguments from previous versions of this function are used, they may be interpreted correctly through this argument, but it is better to provide the correct argument names.

Value

An object of class phybreak with the following elements

d

a list with data, i.e. names, sequences, sampling times, and total number of SNPs.

v

a list with current state of all nodes in the tree: times, hosts in which they reside, parent nodes, node types (sampling, coalescent, or transmission)

p

a list with the parameter values

h

a list with helper information for the MCMC-method: si.mu and si.wh for efficiently proposing mu and slope, matrix dist with weights for infector sampling based on sequence distances, logicals est.mG, est.mS, and est.wh.slope whether to estimate mean generation interval mG, mean sampling interval mS, and within-host slope, and parameters for the priors of mG, mS, and slope.

s

an empty list that will contain vector and matrices with the posterior samples; in matrices, the rows are nodes in the phylogenetic tree, the columns are the samples

Author(s)

Don Klinkenberg don@xs4all.nl

References

Klinkenberg et al. (2017) Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks. PLoS Comput Biol, 13(5): e1005495.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
simulation <- sim_phybreak(obsize = 10)
MCMCstate <- phybreak(dataset = simulation)

simulation <- sim_phybreak(obsize = 10)
MCMCstate <- phybreak(dataset = simulation, use.tree = TRUE)


sampletimedata <- c(0,2,2,4,4)
sampleSNPdata <- matrix(c("a","a","a","a","a",
                          "a","c","c","c","c",
                          "t","t","t","g","g"), nrow = 5)
dataset <- phybreakdata(sequences = sampleSNPdata, sample.times = sampletimedata)
MCMCstate <- phybreak(data = dataset)

### also possible without 'phybreakdata' as intermediate, 
### but only with single samples per host and not with additional data (future implementation)
MCMCstate <- phybreak(data = sampleSNPdata, times = sampletimedata)

donkeyshot/phybreak documentation built on Sept. 17, 2021, 9:32 p.m.