knitr::opts_chunk$set( collapse = TRUE, fig.width=5, fig.height=3, comment = "#>" ) knitr::opts_knit$set(global.par = TRUE) library(cophy)
par(mar = rep(0.1,4)) # setting small margins
The purpose of this R package is to simulate the process of parasite codiversification with their associated host species. A number of possible events can take place during such a process, including cospeciation, parasite extinction and host-shifts. The result of a simulation is a cophylogeny: a pair of host and parasite phylogenies in which each parasite branch is associated with a host branch. The cophylogeny can then be plotted or further analysed.
A simple example may illustrate this. Calling
set.seed(1) # setting random seed for reproducibility coph <- rcophylo(tmax = 5)
generates a cophylogeny over five time units with all parameters set to default values. We can inspect the resulting object by:
coph
This reveals some details of the host and parasite trees. In fact, the cophylogeny object is simply a list of two phylogenetic trees stored in the phylo
format as implemented in the R package ape
[@Paradis2004], with additional information about host associations attached to the parasite tree. We can plot the cophylogeny simply by calling
plot(coph)
This plot shows the host phylogenetic tree in black and, on top of that, the parasite tree in red. Hosts undergo speciation and extinction events whereas parasites always cospeciate with their hosts whenever they speciate and always go extinct when their host goes extinct, and in addition they can go extinct independently or shift to a new host species (which constitutes an independent speciation event for the parasite).
In the following section we will first describe the model underlying the simulations as well as the different parameters and assumptions in more detail. We will then introduce a number of model extensions, give an overview of methods to extract information from cophylogenies and finish with a few more technical details.
The mathematical model underlying the simulations is a continuous-time stochastic process in which certain events happen with small probabilities during infinitesimally small time steps. These probabilities are expressed as rates per unit of time. There are events and corresponding rates for the host species, and events and rates for the parasite species. In the basic model, the process of host diversification is not affected by the parasites. As a result, host diversification can be simulated simultaneously with or prior to the process of parasite diversification. It is also possible to use empirically estimated host trees rather than simulating them. In what follows we will describe the host diversification process in more detail and then proceed to the parasite process.
For the host tree only two events can happen: speciation events and extinction events. In the most basic version of the model these two events occur at constant rates lambda
and mu
, and the process of host diversification is then, mathematically speaking, a regular birth-death process [@Nee2006]. The default parameters for lambda
and mu
in the rcophylo
function are 1 and 0.5, respectively. Changing them has the expected effect, e.g.:
coph <- rcophylo(tmax = 5, lambda = 0.8, mu = 0) plot(coph)
Another parameter that can be set for the host diversification process is a logistic growth carrying capacity, K
. The default value for K
is Inf
(infinity) in which case there is no carrying capacity and the host tree can grow indefinitely. However, when K
is set to a finite value the speciation rate lambda
is reduced by a factor (1-N/K), where N is the number of species. (When N>K the speciation rate will become zero.) The following example shows that when the K
parameter is used, the number of host species will not grow beyond a certain size:
coph <- rcophylo(tmax = 10, mu = 0.05, K = 50) plot(coph)
In many cases, one might be interested not so much in simulating a host tree, but in using an empirically estimated tree and simulating the parasite diversification process on that tree. Similarly, one might be interested in simulating a host tree and then running many simulations for parasite diversification on this same host tree. Both of these goals can be achieved by supplying a host tree to the rcophylo
function via the HTree
argument. The cophylo
package comes with a function called rphylo_H
that simulates a host tree, using the same arguments as discussed above. For example, we can call the following to generate a random host tree under the birth-death process:
set.seed(1) htree <- rphylo_H(tmax = 5, lambda = 1, mu = 0.4)
We can plot this tree using the plotting function implemented in ape
:
plot(htree, root.edge = TRUE, show.tip.label = FALSE)
We can now simulate parasite diversification on this host tree and plot the result:
coph <- rcophylo(HTree = htree) plot(coph)
Running another simulation will produce a different outcome on the same host tree:
coph <- rcophylo(HTree = htree) plot(coph)
Random host trees can of course also be generated using other functions such as rphylo
or rcoal
implemented in ape. Note, however, that by default rphylo
excludes extinct species (this can be prevented by setting fossils = TRUE
) and rcoal
never includes extinct species. Host clades that do not leave any living descents will in general have an important impact on the codiversification process so that using trees without any extinct host species may be problematic.
Parasites are strictly associated with one particular host species, always go extinct when their hosts go extinct and always cospeciate whenever their hosts speciate. The dynamics of parasite diversification are determined primarily by two parameters: beta
and nu
. beta
is the rate at which a parasite attempts a switch to any given other host species. However, such host switches are only successful if the new host is not already infected. Host switching is thus random and density-dependent: the more (uninfected) host species there are the higher the host switching rate for any given parasite. nu
is the parasite extinction rate. Using the host tree generated in the previous section, the following examples show how beta
and nu
affect the dynamics:
coph <- rcophylo(HTree = htree, beta = 0.1, nu = 0.5) plot(coph)
coph <- rcophylo(HTree = htree, beta = 0.1, nu = 0.1) plot(coph)
coph <- rcophylo(HTree = htree, beta = 1.0, nu = 1.5) plot(coph)
In all simulations run so far, it was always assumed that the root of the host tree is already infected with a parasite. However, it is also possible to start with an uninfected host tree and introduce the parasite at a later time, using the PStartT
argument:
set.seed(1) coph <- rcophylo(HTree = htree, PStartT = 2) plot(coph)
Here, a host branch is chosen randomly with equal probability for all host branches for the first parasite to land on. The user can also specify the initial branch number using the iniHBranch
argument.
The basic model makes a number of assumptions that may be unrealistic, including the assumptions of random parasite transmission between host species, absence of multiple infections and faithful parasite cospeciation. In this section we will describe arguments of the rcophylo
function that can be used to relax these assumptions. Throughout this section we will use the host tree from the previous section and simulate the parasite dynamics on this tree.
The probability that a host-shifts is successful may be expected to decline with increasing phylogenetic distance between the donor and recipient host [@Charleston2002, @Engelstaedter2006]. The rcophylo
function can incorporate this expectation through the parameter $\gamma$ (argument gamma
), which specifies the rate of exponential decline of host-shift success with increasing phylogenetic distance $D$ as $P_{success}=e^{-\gamma D}$. The default value for gamma
is zero (no phylogenetic distance effect, as in all previous examples). The following example shows that with a positive value of gamma
, host shifts tend to occur between closely related species only:
set.seed(1) coph <- rcophylo(HTree = htree, beta = 3, nu = 1, gamma = 0.4) plot(coph)
By default, host speciation events always trigger faithful speciation events in associated parasites. This assumption can be relaxed through the parameter $\delta$ (argument delta
, defaulting to zero), which is the probability that one (randomly chosen) daughter host species does not inherit the parasite. Examples of such parasite loss during host speciation can be seen in the following simulation:
set.seed(1) coph <- rcophylo(HTree = htree, beta = 0.2, delta = 0.2) plot(coph)
Hosts species that are already infected with a parasite may make it harder for new parasites to establish. In the basic model, the strong assumption is made that an established infection in a host completely precludes any successful host shifts to that species. This assumption can be relaxed through specifying the parameter $\sigma$ (argument sigma
), which is a factor reducing the probability of successful host shifts. Specifically, if a host is already infected with $k$ parasite species, the probability of host shift success is multiplied by $\sigma^k$. If sigma
is set to one, preexisting parasites have no effect on the success of host shifts. (Note that this will lead to an exponential increase in the number of parasites over time.)
In the following example, instances of host shifts to already infected hosts can be seen:
set.seed(1) coph <- rcophylo(HTree = htree, beta = 1, nu = 0.5, sigma = 0.1) plot(coph)
It is difficult in this plot to see how many parasites are infecting a given host branch as the plotting function simply plots all parasite branches on top of each other. Currently, the only option to get some impression of the multiplicity of infection is to set the colour of the parasite branches to a semi-transparent colour:
plot(coph, parasiteCol = rgb(1, 0, 0, alpha = 0.5))
In the basic model, parasite speciation events can occur only through cospeciation or host shifts. The parameter $\kappa$ (argument kappa
, defaulting to zero) can be used to introduce within-host parasite speciation events. Such events will lead to coinfection with two parasites (similar to coinfections caused by host shifts with $\sigma>0$, see above), and $\kappa$ is the rate at which such parasite speciation events occur. Within-host speciation events are shown as dots on the plotted cophylogeny. Again, multiple parasite lineages on the same host are difficult to visualise in large cophylogenies but this can be achieved to some extent by using semi-transparent parasite branches:
set.seed(1) coph <- rcophylo(HTree = htree, kappa = 0.5) plot(coph, parasiteCol = rgb(1, 0, 0, alpha = 0.5))
The previous sections have already provided examples of cophy
's plotting function. This function takes a cophylogeny object as its main argument. Currently, the only other arguments are hostCol
and parasiteCol
which can be used to set colours for the host and parasite branches, respectively.
In some cases it might be useful to plot only the extant species of a cophylogeny. In this case, the host and parasite tree can not be plotted together as the historical host-parsite associations may no longer be correct. Instead, by calling the prune_Cophylo
function, the pruned host and parasite trees can be returned long with a data.frame giving the host-parasite tip associations.
set.seed(1) coph <- rcophylo(HTree = htree) cophPruned <- prune_cophylo(cophy = coph) cophPruned
This output can then be passed to the phytools
function cophylo
in order to prepare the object for plotting [@Revell2012]. Passing the resulting object to the plot function will plot the pruned host and parasite trees with connector lines expressing the terminal host-parasite associations. An example of this can be seen bellow:
plot(phytools::cophylo(cophPruned$prunedHtree, cophPruned$prunedPtree, cophPruned$tipAssociations))
The package comes with a limited set of functions that extract useful information from cophylogenies. The most basic of these is the function get_infectionStatistics
which displays some simple key statistics of the outcome of a codiversification process at the end of the simulation:
set.seed(1) # setting random seed for reproducibility coph <- rcophylo(tmax = 5) plot(coph) get_infectionStatistics(coph)
The four numbers returned by get_infectionStatistics
are 1) the number of surviving host species, 2) the number of surviving parasite species, 3) the fraction of infected host species and 4) the mean number of parasite species that each host species carries. Note that in the absence of multiple infections the last two numbers will always be identical.
When coinfections occur, more complete information on how many surviving host species are infected with different numbers of parasites can be obtained with the get_infectionFrequencies
function:
coph <- rcophylo(tmax = 5, kappa = 0.5) plot(coph) get_infectionStatistics(coph) get_infectionFrequencies(coph)
In this example, within-host speciation events of the parasites (introduced by the kappa
argument) leads to multiple infections within host species. At the end of the simulation, three host species are uninfected, eleven are infected with a single parasite, three with two parasites etc.
In some simulations the parasite may go extinct, and in this case one might be interested in when this happened. This information can be extracted using the function get_PextinctionTime
':
set.seed(5) coph <- rcophylo(tmax = 5) plot(coph) get_PextinctionTime(coph)
Here, time runs from zero (the root of the tree) to five (the present day), and the last parasite becomes extinct at time $t\approx4.2$.
Finally, the function get_PHDistCorrelation
can be used to gain some insight into how well the host and parasite phylogeny are aligned. Specifically, this function calculates the correlation coefficient between the phylogenetic distances between all pairs of parasites and the phylogenetic distances between the corresponding pairs of hosts with which the parasites are associated. Consider the following example in which there are no host shifts:
set.seed(1) coph <- rcophylo(tmax = 5, beta = 0, nu = 0.05) plot(coph)
Here, the phylogenetic distances are the same for all pairs of parasite species and their associated hosts so that get_PHDistCorrelation
returns one:
get_PHDistCorrelation(coph)
By contrast, with frequent, random host-switching the correlation coefficient will be close to zero:
set.seed(1) coph <- rcophylo(tmax = 5) plot(coph) get_PHDistCorrelation(coph)
Preferential host-switching to new hosts that are phylogenetically close to the original host using the gamma
argument will tend to produce positive values:
set.seed(2) coph <- rcophylo(tmax = 5, beta = 3, nu = 1, gamma = 0.4) plot(coph) get_PHDistCorrelation(coph)
This section will cover some additional arguments of the rcophylo
function that can be used to speed up the simulations.
exportFormat
By default, rcophylo
returns an object of class cophylogeny
, which essentially is a list of two objects (host and parasite tree) that belong to the phylo
class as implemented in the ape
package [@Paradis2004]. (The parasite tree contains additional information on host associations that are not normally part of a phylo
object.) The advantages of exporting a cophylogeny in this way are clear: the two trees can be separately analysed and plotted using the full arsenal of the functions implemented in ape
and many R packages devoted to phylogenetic analyses. Nevertheless, internally the rcophylo
function uses a different format for representing the two trees that uses more memory but makes the simulation more efficient. By setting the argument exportFormat
to the value "raw" instead of the default "cophylogeny", the cophylogeny is returned in the unconverted, internal format. This may be advantageous for some downstream applications and is also faster because the conversion step is omitted.
Similarly, the rphylo_H
function by default returns a tree in phylo
format but internally uses a different format. By setting the exportFormat
argument to "raw", the host tree is returned in the internal format and can then be passed on to rcophylo
in this internal format. (rcophylo
automatically recognises which of the two formats the Htree
argument is in.) This can save some computation time when running many simulations. For example, compare:
start <- Sys.time() set.seed(1) htree <- rphylo_H(tmax = 10) coph <- rcophylo(HTree = htree) end <- Sys.time() plot(coph) print(paste0("Time needed for simulation: ", end-start,"s."))
start <- Sys.time() set.seed(1) htree <- rphylo_H(tmax = 10, exportFormat = "raw") coph <- rcophylo(HTree = htree) end <- Sys.time() plot(coph) print(paste0("Time needed for simulation: ", end-start,"s."))
A third accepted value for exportFormat
in the rcophylo
function is "PhyloPonly". When this option is selected, only the parasite tree will be returned (in phylo
format). This may be useful when running many simulations on the same host tree.
timestep
In the rcophylo
and rphylo_H
functions, time proceeds in small time steps during which the various events can happen. The duration of these time steps is fixed to a value given by the timestep
argument, defaulting to $10^{-3}$. If an event happens during one of these time steps, the actual time at which this event occurs is drawn from a uniform distribution, thus ensuring a quasi-continuous-time process. Setting the timestep
argument to a larger value will speed up the simulations but may come at the cost of accuracy: the larger the time step, the higher the likelihood that several events occur within the same time step that should influence each other in the mathematical model but do not in the algorithm. The default value for timestep
was found to provide a good compromise between accuracy and speed for time frames and rates of events that roughly fall within the same order of magnitude as the examples used in this vignette, but we have made no attempt to systematically search for optimal values of this argument.
Gdist
When simulating on a known host tree, the user may provide a matrix of host phylogenetic distances to the rcophylo
function. Internally, the rcophylo
function tracks host phylogenetic distances through time in order to allow for preferential host shifting. When the initial host genetic distances at the time when the parasite is introduced are known, proving these distances through Gdist
will speed up the simulations, which may be useful especially when simulating many parasite trees on the same host tree.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.