knitr::opts_chunk$set(
  collapse = TRUE,
  fig.width=5, fig.height=3,
  comment = "#>"
)
knitr::opts_knit$set(global.par = TRUE)
library(cophy)
  par(mar = rep(0.1,4))  # setting small margins

Introduction

The purpose of this R package is to simulate the process of parasite codiversification with their associated host species. A number of possible events can take place during such a process, including cospeciation, parasite extinction and host-shifts. The result of a simulation is a cophylogeny: a pair of host and parasite phylogenies in which each parasite branch is associated with a host branch. The cophylogeny can then be plotted or further analysed.

A simple example may illustrate this. Calling

  set.seed(1)  # setting random seed for reproducibility
  coph <- rcophylo(tmax = 5)

generates a cophylogeny over five time units with all parameters set to default values. We can inspect the resulting object by:

  coph

This reveals some details of the host and parasite trees. In fact, the cophylogeny object is simply a list of two phylogenetic trees stored in the phylo format as implemented in the R package ape [@Paradis2004], with additional information about host associations attached to the parasite tree. We can plot the cophylogeny simply by calling

  plot(coph)

This plot shows the host phylogenetic tree in black and, on top of that, the parasite tree in red. Hosts undergo speciation and extinction events whereas parasites always cospeciate with their hosts whenever they speciate and always go extinct when their host goes extinct, and in addition they can go extinct independently or shift to a new host species (which constitutes an independent speciation event for the parasite).

In the following section we will first describe the model underlying the simulations as well as the different parameters and assumptions in more detail. We will then introduce a number of model extensions, give an overview of methods to extract information from cophylogenies and finish with a few more technical details.

The basic model

The mathematical model underlying the simulations is a continuous-time stochastic process in which certain events happen with small probabilities during infinitesimally small time steps. These probabilities are expressed as rates per unit of time. There are events and corresponding rates for the host species, and events and rates for the parasite species. In the basic model, the process of host diversification is not affected by the parasites. As a result, host diversification can be simulated simultaneously with or prior to the process of parasite diversification. It is also possible to use empirically estimated host trees rather than simulating them. In what follows we will describe the host diversification process in more detail and then proceed to the parasite process.

Host diversification

For the host tree only two events can happen: speciation events and extinction events. In the most basic version of the model these two events occur at constant rates lambda and mu, and the process of host diversification is then, mathematically speaking, a regular birth-death process [@Nee2006]. The default parameters for lambda and mu in the rcophylo function are 1 and 0.5, respectively. Changing them has the expected effect, e.g.:

  coph <- rcophylo(tmax = 5, lambda = 0.8, mu = 0)
  plot(coph)

Another parameter that can be set for the host diversification process is a logistic growth carrying capacity, K. The default value for K is Inf (infinity) in which case there is no carrying capacity and the host tree can grow indefinitely. However, when K is set to a finite value the speciation rate lambda is reduced by a factor (1-N/K), where N is the number of species. (When N>K the speciation rate will become zero.) The following example shows that when the K parameter is used, the number of host species will not grow beyond a certain size:

  coph <- rcophylo(tmax = 10, mu = 0.05, K = 50)
  plot(coph)

In many cases, one might be interested not so much in simulating a host tree, but in using an empirically estimated tree and simulating the parasite diversification process on that tree. Similarly, one might be interested in simulating a host tree and then running many simulations for parasite diversification on this same host tree. Both of these goals can be achieved by supplying a host tree to the rcophylo function via the HTree argument. The cophylo package comes with a function called rphylo_H that simulates a host tree, using the same arguments as discussed above. For example, we can call the following to generate a random host tree under the birth-death process:

  set.seed(1)
  htree <- rphylo_H(tmax = 5, lambda = 1, mu = 0.4)

We can plot this tree using the plotting function implemented in ape:

  plot(htree, root.edge = TRUE, show.tip.label = FALSE)

We can now simulate parasite diversification on this host tree and plot the result:

  coph <- rcophylo(HTree = htree)
  plot(coph)

Running another simulation will produce a different outcome on the same host tree:

  coph <- rcophylo(HTree = htree)
  plot(coph)

Random host trees can of course also be generated using other functions such as rphylo or rcoal implemented in ape. Note, however, that by default rphylo excludes extinct species (this can be prevented by setting fossils = TRUE) and rcoal never includes extinct species. Host clades that do not leave any living descents will in general have an important impact on the codiversification process so that using trees without any extinct host species may be problematic.

Parasite diversification

Parasites are strictly associated with one particular host species, always go extinct when their hosts go extinct and always cospeciate whenever their hosts speciate. The dynamics of parasite diversification are determined primarily by two parameters: beta and nu. beta is the rate at which a parasite attempts a switch to any given other host species. However, such host switches are only successful if the new host is not already infected. Host switching is thus random and density-dependent: the more (uninfected) host species there are the higher the host switching rate for any given parasite. nu is the parasite extinction rate. Using the host tree generated in the previous section, the following examples show how beta and nu affect the dynamics:

  coph <- rcophylo(HTree = htree, beta = 0.1, nu = 0.5)
  plot(coph)
  coph <- rcophylo(HTree = htree, beta = 0.1, nu = 0.1)
  plot(coph)
  coph <- rcophylo(HTree = htree, beta = 1.0, nu = 1.5)
  plot(coph)

In all simulations run so far, it was always assumed that the root of the host tree is already infected with a parasite. However, it is also possible to start with an uninfected host tree and introduce the parasite at a later time, using the PStartT argument:

  set.seed(1)
  coph <- rcophylo(HTree = htree, PStartT = 2)
  plot(coph)

Here, a host branch is chosen randomly with equal probability for all host branches for the first parasite to land on. The user can also specify the initial branch number using the iniHBranch argument.

Model extensions

The basic model makes a number of assumptions that may be unrealistic, including the assumptions of random parasite transmission between host species, absence of multiple infections and faithful parasite cospeciation. In this section we will describe arguments of the rcophylo function that can be used to relax these assumptions. Throughout this section we will use the host tree from the previous section and simulate the parasite dynamics on this tree.

Host shift success declining with phylogenetic distance

The probability that a host-shifts is successful may be expected to decline with increasing phylogenetic distance between the donor and recipient host [@Charleston2002, @Engelstaedter2006]. The rcophylo function can incorporate this expectation through the parameter $\gamma$ (argument gamma), which specifies the rate of exponential decline of host-shift success with increasing phylogenetic distance $D$ as $P_{success}=e^{-\gamma D}$. The default value for gamma is zero (no phylogenetic distance effect, as in all previous examples). The following example shows that with a positive value of gamma, host shifts tend to occur between closely related species only:

  set.seed(1)
  coph <- rcophylo(HTree = htree, beta = 3, nu = 1, gamma = 0.4)
  plot(coph)

Parasite loss during host speciation

By default, host speciation events always trigger faithful speciation events in associated parasites. This assumption can be relaxed through the parameter $\delta$ (argument delta, defaulting to zero), which is the probability that one (randomly chosen) daughter host species does not inherit the parasite. Examples of such parasite loss during host speciation can be seen in the following simulation:

  set.seed(1)
  coph <- rcophylo(HTree = htree, beta = 0.2, delta = 0.2)
  plot(coph)

Host-shifting to an already infected host species

Hosts species that are already infected with a parasite may make it harder for new parasites to establish. In the basic model, the strong assumption is made that an established infection in a host completely precludes any successful host shifts to that species. This assumption can be relaxed through specifying the parameter $\sigma$ (argument sigma), which is a factor reducing the probability of successful host shifts. Specifically, if a host is already infected with $k$ parasite species, the probability of host shift success is multiplied by $\sigma^k$. If sigma is set to one, preexisting parasites have no effect on the success of host shifts. (Note that this will lead to an exponential increase in the number of parasites over time.)

In the following example, instances of host shifts to already infected hosts can be seen:

  set.seed(1)
  coph <- rcophylo(HTree = htree, beta = 1, nu = 0.5, sigma = 0.1)
  plot(coph)

It is difficult in this plot to see how many parasites are infecting a given host branch as the plotting function simply plots all parasite branches on top of each other. Currently, the only option to get some impression of the multiplicity of infection is to set the colour of the parasite branches to a semi-transparent colour:

  plot(coph, parasiteCol = rgb(1, 0, 0, alpha = 0.5))

Parasite speciation within hosts

In the basic model, parasite speciation events can occur only through cospeciation or host shifts. The parameter $\kappa$ (argument kappa, defaulting to zero) can be used to introduce within-host parasite speciation events. Such events will lead to coinfection with two parasites (similar to coinfections caused by host shifts with $\sigma>0$, see above), and $\kappa$ is the rate at which such parasite speciation events occur. Within-host speciation events are shown as dots on the plotted cophylogeny. Again, multiple parasite lineages on the same host are difficult to visualise in large cophylogenies but this can be achieved to some extent by using semi-transparent parasite branches:

  set.seed(1)
  coph <- rcophylo(HTree = htree, kappa = 0.5)
  plot(coph, parasiteCol = rgb(1, 0, 0, alpha = 0.5))

Plotting cophylogenies

The previous sections have already provided examples of cophy's plotting function. This function takes a cophylogeny object as its main argument. Currently, the only other arguments are hostCol and parasiteCol which can be used to set colours for the host and parasite branches, respectively.

In some cases it might be useful to plot only the extant species of a cophylogeny. In this case, the host and parasite tree can not be plotted together as the historical host-parsite associations may no longer be correct. Instead, by calling the prune_Cophylo function, the pruned host and parasite trees can be returned long with a data.frame giving the host-parasite tip associations.

  set.seed(1)
  coph <- rcophylo(HTree = htree)
  cophPruned <- prune_cophylo(cophy = coph)
  cophPruned

This output can then be passed to the phytools function cophylo in order to prepare the object for plotting [@Revell2012]. Passing the resulting object to the plot function will plot the pruned host and parasite trees with connector lines expressing the terminal host-parasite associations. An example of this can be seen bellow:

  plot(phytools::cophylo(cophPruned$prunedHtree, cophPruned$prunedPtree, cophPruned$tipAssociations))

Analysing cophylogenies

The package comes with a limited set of functions that extract useful information from cophylogenies. The most basic of these is the function get_infectionStatistics which displays some simple key statistics of the outcome of a codiversification process at the end of the simulation:

  set.seed(1)  # setting random seed for reproducibility
  coph <- rcophylo(tmax = 5)
  plot(coph)
  get_infectionStatistics(coph)

The four numbers returned by get_infectionStatistics are 1) the number of surviving host species, 2) the number of surviving parasite species, 3) the fraction of infected host species and 4) the mean number of parasite species that each host species carries. Note that in the absence of multiple infections the last two numbers will always be identical.

When coinfections occur, more complete information on how many surviving host species are infected with different numbers of parasites can be obtained with the get_infectionFrequencies function:

  coph <- rcophylo(tmax = 5, kappa = 0.5)
  plot(coph)
  get_infectionStatistics(coph)
  get_infectionFrequencies(coph)

In this example, within-host speciation events of the parasites (introduced by the kappa argument) leads to multiple infections within host species. At the end of the simulation, three host species are uninfected, eleven are infected with a single parasite, three with two parasites etc.

In some simulations the parasite may go extinct, and in this case one might be interested in when this happened. This information can be extracted using the function get_PextinctionTime':

  set.seed(5)
  coph <- rcophylo(tmax = 5)
  plot(coph)
  get_PextinctionTime(coph)

Here, time runs from zero (the root of the tree) to five (the present day), and the last parasite becomes extinct at time $t\approx4.2$.

Finally, the function get_PHDistCorrelation can be used to gain some insight into how well the host and parasite phylogeny are aligned. Specifically, this function calculates the correlation coefficient between the phylogenetic distances between all pairs of parasites and the phylogenetic distances between the corresponding pairs of hosts with which the parasites are associated. Consider the following example in which there are no host shifts:

  set.seed(1)
  coph <- rcophylo(tmax = 5, beta = 0, nu = 0.05)
  plot(coph)

Here, the phylogenetic distances are the same for all pairs of parasite species and their associated hosts so that get_PHDistCorrelation returns one:

  get_PHDistCorrelation(coph)

By contrast, with frequent, random host-switching the correlation coefficient will be close to zero:

  set.seed(1)
  coph <- rcophylo(tmax = 5)
  plot(coph)
  get_PHDistCorrelation(coph)

Preferential host-switching to new hosts that are phylogenetically close to the original host using the gamma argument will tend to produce positive values:

  set.seed(2)
  coph <- rcophylo(tmax = 5, beta = 3, nu = 1, gamma = 0.4)
  plot(coph)
  get_PHDistCorrelation(coph)

Some technical details

This section will cover some additional arguments of the rcophylo function that can be used to speed up the simulations.

exportFormat

By default, rcophylo returns an object of class cophylogeny, which essentially is a list of two objects (host and parasite tree) that belong to the phylo class as implemented in the ape package [@Paradis2004]. (The parasite tree contains additional information on host associations that are not normally part of a phylo object.) The advantages of exporting a cophylogeny in this way are clear: the two trees can be separately analysed and plotted using the full arsenal of the functions implemented in ape and many R packages devoted to phylogenetic analyses. Nevertheless, internally the rcophylo function uses a different format for representing the two trees that uses more memory but makes the simulation more efficient. By setting the argument exportFormat to the value "raw" instead of the default "cophylogeny", the cophylogeny is returned in the unconverted, internal format. This may be advantageous for some downstream applications and is also faster because the conversion step is omitted.

Similarly, the rphylo_H function by default returns a tree in phylo format but internally uses a different format. By setting the exportFormat argument to "raw", the host tree is returned in the internal format and can then be passed on to rcophylo in this internal format. (rcophylo automatically recognises which of the two formats the Htree argument is in.) This can save some computation time when running many simulations. For example, compare:

  start <- Sys.time()
  set.seed(1)
  htree <- rphylo_H(tmax = 10)
  coph <- rcophylo(HTree = htree)
  end <- Sys.time()
  plot(coph)
  print(paste0("Time needed for simulation: ", end-start,"s."))
  start <- Sys.time()
  set.seed(1)
  htree <- rphylo_H(tmax = 10, exportFormat = "raw")
  coph <- rcophylo(HTree = htree)
  end <- Sys.time()
  plot(coph)
  print(paste0("Time needed for simulation: ", end-start,"s."))

A third accepted value for exportFormat in the rcophylo function is "PhyloPonly". When this option is selected, only the parasite tree will be returned (in phylo format). This may be useful when running many simulations on the same host tree.

timestep

In the rcophylo and rphylo_H functions, time proceeds in small time steps during which the various events can happen. The duration of these time steps is fixed to a value given by the timestep argument, defaulting to $10^{-3}$. If an event happens during one of these time steps, the actual time at which this event occurs is drawn from a uniform distribution, thus ensuring a quasi-continuous-time process. Setting the timestep argument to a larger value will speed up the simulations but may come at the cost of accuracy: the larger the time step, the higher the likelihood that several events occur within the same time step that should influence each other in the mathematical model but do not in the algorithm. The default value for timestep was found to provide a good compromise between accuracy and speed for time frames and rates of events that roughly fall within the same order of magnitude as the examples used in this vignette, but we have made no attempt to systematically search for optimal values of this argument.

Gdist

When simulating on a known host tree, the user may provide a matrix of host phylogenetic distances to the rcophylo function. Internally, the rcophylo function tracks host phylogenetic distances through time in order to allow for preferential host shifting. When the initial host genetic distances at the time when the parasite is introduced are known, proving these distances through Gdist will speed up the simulations, which may be useful especially when simulating many parasite trees on the same host tree.

References



JanEngelstaedter/cophy documentation built on Nov. 7, 2023, 9:37 a.m.