$^1$ School of Biosciences, University of Sheffield, Sheffield, S10 2TN, United Kingdom. guillert@tcd.ie
Running head:
treats
: TREes And Traits Simulations
Simulating biological realistic data is an important step to understand and investigate biodiversity. Simulated data can be used to generate null, base line or neutral models. These can be used either in comparison to observed data to estimate the mechanisms that generated the data. Or they can be used to explore, understand and develop theoretical advances by proposing toy models.
In evolutionary biology, simulations often involve the need of an evolutionary process where descent with modification is at the core of how the simulated data is generated. These evolutionary processes can then be nearly infinitely modified to include complex processes that affect the simulations such as traits co-evolution, competition mechanisms or mass extinction events.
Here I present the treats
package, a modular R
package for trees and traits simulations.
This package is based on a simple birth death algorithm from which all steps can easily be modified by users.
treats
also provides a tidy interface through the treats
object, allowing users to easily run reproducible simulations.
It also comes with an extend manual regularly updated following users' questions or suggestions.
Comparing biological patterns is one of the key ways to understand mechanisms in evolutionary biology. This leads to the development of phylogenetic comparative methods as key methodologically driven topic in ecology, evolution and palaeontology [@felsensteinPCM; @pennell2013review]. These methods rely on comparing patterns in a phylogenetic context to understand biological mechanisms or concepts [@harmon2019book]. These comparisons can be done between observed patterns under different conditions or against null, neutral or baseline models (see @bausman2018neutral for distinctions) suggesting different processes or mechanisms. For example different traits distribution for species with different diets [@deepak2023diet] or habitats [@pinto2017geographical]. Or by comparing some observed pattern to one simulated under null or base conditions [@miller2022alternating]. In theory, workers can use the following research pipeline: 1) thinking of a specific mechanism (e.g. mass extinction allowing the surviving species to acquire new morphologies), 2) collecting some data to test this mechanism (e.g. some traits of species across and extinction event) and then 3) comparing these patterns to some simulated under no specific conditions (e.g. a null model where the traits evolve randomly regardless of an extinction event @puttick2020complex). Workers might thus need to simulate a great diversity of evolutionary scenarios to test their specific question. To do so, we need statistical and software solutions to simulate trees and data to generate many specific null models.
In practice, these evolutionary simulations can be done relatively easily on computers using a birth-death process [@feller1939birthdeath; @stadler2010birthdeath; @diversitree].
A birth-death process is a continuous time Markov process that had been routinely implemented in R
[@R] to simulate realistic phylogenies (e.g. [@ape], [@diversitree]).
This general algorithm to generate phylogenetic trees can be coupled with other Markov processes to also generate traits, for example using a Brownian Motion process (BM; @cavalli1967BM) or an Ornstein Uhlenbeck (OU; @lande1976OU; see @cooper2016cautionary for a distinction between both).
In R
, this can be done with several already well used and well documented packages.
For example if you want to simulate diversity through time, you can use TreeSim
[@treesim] to simulate diversity under a set of specific parameters (e.g. speciation and extinction) with some events disrupting the simulations (e.g. mass extinctions).
You can even improve on generating these patterns using FossilSim
[@fossilsim] to take into account fossilization processes.
You can also use paleobuddy
[@paleobuddy] or paleotree
[@paleotree] to generate palaeontology specific data.
On the other hand, if you need to simulate both diversity and traits through time, this can be done with specific parameters in RPANDA
[@rpanda], diversitree
[@diversitree] or PETER
[@puttick2020complex] where the traits are generated stochasticaly through time (given some process) during the birth-death process.
Although the packages mentioned above are excellent and routinely used with fast and reliable algorithms and associated documentation, they are all designed for specific tasks and don't allow much modification beyond the input parameters designed by the authors.
For example, TreeSim
can simulate a birth-death tree with some extinction event but is not designed to simulated one with an extinction event that leads to the birth-death process to be not diversity dependent anymore, simulating a release in selection pressure after the extinction event that leads to a different process dominating speciation.
Or PETER
is not designed to simulate a complex set of traits (say three correlated BM traits and two independent OU ones).
This absence of modularity has hampered the use of complex and question-driven simulations, although I acknowledge this was not the primary aim of the authors of the excellent packages mentioned above.
This has led workers to often develop their own tools to answer specific questions (e.g. @puttick2020complex).
Therefore, I propose treats
a modular R
package to simulate both trees and traits through time.
Note that although treats
is modular and thus allows to be used as go to tool for simulating and trees and traits, it lacks the ready-to-use implemented methods featured in other packages such as fossilisation and sampling [@fossilsim; @treesim; @paleobuddy] or specific macroevolutionary simulations [@rpanda; @puttick2020complex].
treats
is based on the eponymous treats
function that allows to simulate a phylogeny and some trait(s) simultaneously.
The base birth-death algorithm "grows" a phylogenetic tree and generates traits for each node and tips in the following manner:
These three steps are repeated until the tree reaches the desired age or number of species (the algorithm's implementation is heavily inspired and based on @diversitree). If traits are simulated during the process, a fourth step is added:
In treats
, these three or four steps are implemented as modular functions that the user can easily change using an internal class of objects called "modifiers"
or "traits"
(Fig. 1).
The simulation then outputs a tree (of class "phylo"
and a associated table of traits - "matrix"
) that can be visualised using the plot.treats
function.
A third class of object called "events"
can be added to the simulations to modify the entire simulation under certain conditions (e.g. simulating a mass extinction).
Each element in the algorithm can be modified by the user using the implemented functions make.bd.params
to set the birth-death parameters, make.traits
to set the trait(s), make.modifiers
to set the birth-death algorithm and make.events
to eventually add one or more events.
Users can replace any step in the algorithm by their own specific functions suiting their needs or, for less advanced users, by using already implemented functions.
{#figure1}
The following sections provides an overview of the three main functions displayed in Figure 1 (make.traits
, make.modifiers
and make.events
).
Traits are simulated via the make.traits
function given one or more processes, number of dimensions per process and some starting value(s).
Essentially, the generation of new trait values is based on a process (a function
) modifying a trait value (x0
) relative to some branch length (edge.length
).
For example, a simple Brownian Motion process could be generated by the function rnorm
where it modifies the trait value x0
relative to the branch length (edge.length
).
The longer the branch length, the more likely the new trait value will be different from x0
:
## Brownian Motion process my.bm.process <- function(x0, edge.length) { rnorm(n = 1, mean = x0, sd = edge.length) } ## Creating the traits object my.trait <- make.traits(process = my.bm.process)
Of course biological data can be much more complex and often multivariate [@adams2019multivarPCM] requiring the users to develop more complex function to cater their specific needs.
The treats
package contains a list of pre-built processes that are "ready-to-use":
BM.process
and OU.process
: generalised Brownian Motion and Ornstein-Uhlenbeck processes in any number of dimensions (including possible correlation);multi.peak.process
: a generalised Ornstein-Uhlenbeck process (uni- or multi-dimensional) which allows for multiple peaks;repulsion.process
: a unidimensional process generating trait values that don't overlap with previously generated trait values;discrete.process
: generates discrete trait values;no.process
: ignores the branch length (i.e. a time independent process);Traits are always generated (and stored) only for tips or nodes and not along edges.
However, it is possible so generate trait values at specific or regular time intervals by using the option save.steps
in the treats
function.
This will generate singleton nodes (i.e. nodes with only one descendant) with associated trait values.
Note that the treats
package primary aim is to generate both a tree and some traits at the same time.
However, it is possible to also just generate traits with a given topology.
This is done through the function map.traits
that intakes one or more trees and a "traits"
object.
Modifying the birth-death process can be done in several ways.
Most easily it is done through changing the stopping rules through the stop.rule
argument (number of total taxa or living ones, or time of the simulation).
Equally straightforward, one can modify the parameters of the birth-death process through the make.bd.params
function: the speciation ($\lambda$) and the extinction ($\mu$) ones.
These can be either fixed values (for constant speciation and extinction) or values drawn from distributions.
It is also possible to directly modify how the birth-death algorithm works through make.modifiers
by changing the three main components of the birth-death algorithm as described above.
By default, the algorithm uses the following algorithms:
## The default branch length generation rexp(1, rate = number_of_lineages * (speciation + extinction))
## The default lineage selection sample(number_of_lineages, 1)
## The default speciation/extinction decider runif(1) <= (speciation / (speciation + extinction))
The function make.modifiers
allows to specifically change any of these components by providing a different function for each part of the algorithm.
For example, one can modify the three functions above so that the branch length is not dependent on the number of lineages, the sampling is always the first species available and the extinction is drawn from a normal distribution.
Note that these function need a specific syntax that is detailed in the treats
manual.
## Lineage independent waiting: lineage.independent <- function(bd.params, lineage = NULL, trait.values = NULL, modify.fun = NULL) { my_rate <- bd.params$speciation + bd.params$extinction return(rexp(1, rate = my_rate)) } ## Selecting always the first species select.first <- function(bd.params, lineage = NULL, trait.values = NULL, modify.fun = NULL) { return(as.integer(1)) } ## Random normal speciation normal.speciation <- function(bd.params, lineage = NULL, trait.values = NULL, modify.fun = NULL) { my_turnover <- bd.params$speciation/ (bd.params$speciation + bd.params$extinction) return(rnorm(1) <= my_turnover) } ## Creating the modifier object modified.birth.death <- make.modifiers(branch.length = lineage.independent, selection = select.first, speciation = normal.speciation)
The final major argument to be passed to treats
are the "events"
objects generated through make.events
where the following information needs to be specified:
target
designating what the event should be applied to (e.g. "taxa"
for modifying the number of species, "traits"
for modifying the traits, etc.)condition
which is a function returning a logical value of when to trigger the event (e.g. when reaching a certain number of taxa, after some specific time has ellapsed or if some specific trait value is reached, etc.)modification
which is a function that specifically modifies an internal object in the treats
algorithm.For a more exhaustive list of events so you can refer to the treats
manual with many different detailed examples.
Briefly though, this is how the mass extinction events are designed in the example above.
## Creating an extinction that removes species with positive trait values positive_extinction <- make.events( target = "taxa", condition = age.condition(15), modification = trait.extinction(x = 0, condition = `>=`))
For this event, the target is the number of taxa in the simulations.
This is indicated using the target = "taxa"
argument.
Then the event is triggered using the argument age.condition(15)
and modifies the internal lineage
object using the trait.extinction(x = 0, condition = `>=`)
function.
age.condition
and trait.extinction
are both function factories which are not going to be detailed here (see @wickham2019advanced).
Effectively these arguments can be passed directly as standard functions.
For example, to trigger the event when reaching time 15 we can use the following function:
## Returns TRUE when reaching time 15 reaching.time15 <- function(bd.params, lineage, trait.values, time) { return(time > 15) }
This will trigger the modification
which is equivalent to the following function modifying the lineage
internal list:
removing.75.taxa <- function(bd.params, lineage, trait.values) { ## Select a portion of the living species to go extinct extinct <- sample(lineage$n, round(lineage$n * 0.75)) ## Update the lineage object lineage$livings <- lineage$livings[-extinct] lineage$n <- lineage$n - length(extinct) return(lineage) }
Hence, the extinction event described above is equivalent to the following one:
## Creating an extinction that removes species with positive trait values positive_extinction <- make.events( target = "taxa", condition = reaching.time15, modification = removing.75.taxa)
The treats
package also comes with tools to visualise trees and traits together or separately.
This can be done through the generic S3 plot
function (calling plot.treats
) and allows to display up to three traits or two traits and time (in 3D) as displayed in Figures 4 and 5.
These functions can also be used to visualise trees and traits together from non "treats"
objects by using the make.treats
function to transform them into "treats"
objects.
To illustrate this we will look whether it is possible to detect changes in disparity (i.e. diversity of traits) in a subset of the data published from @beck2014ancient implemented in @dispRity. This dataset contains the ordinated traits for 50 mammalian species across the Cretaceous-Palaeogene extinction event (K-Pg, 66 Mya; Fig. 2).
## Loading the package and data library(treats) data(BeckLee_tree) data(BeckLee_mat99) ## Creating the time slices time_slices <- chrono.subsets(BeckLee_mat99, BeckLee_tree, method = "continuous", model = "proximity", time = seq(from = 96, to = 36, by = -10)) ## Calculating disparity on the two first dimensions only observed_disparity <- dispRity(boot.matrix(time_slices), metric = c(sum, variances), dimensions = c(1,2))
## Plotting the tree and the disparity through time par(mfrow = c(1,2), bty = "null") ## Thre tree plot(ladderize(BeckLee_tree), show.tip.label = FALSE) axisPhylo() abline(v = BeckLee_tree$root.time - 66, col = "red") legend("topleft", pch = NULL, legend = "A", bty = "n", cex = 2) ## The disparity plot(observed_disparity, ylab = "Sum of variances") abline(v = 4, col = "red") legend("topleft", pch = NULL, legend = "B", bty = "n", cex = 2)
Using this example dataset, one might be interested in testing whether the K-Pg extinction had an effect on disparity through time. But can such effect be detected in the first place? We can test this by simulating some datasets with similar properties as the observed data and measure changes in disparity in these simulated datasets.
The first and simplest way is to simulate tree topologies that have similar properties than the observed one.
To do so, we need to use some speciation parameter indicating the rate at which lineages speciate (aka "birth" or "$\lambda$") and an extinction parameter indicating the rate at which they go extinct (aka "death" or "$\mu$").
Here we are using two relatively arbitrary (speciation = 0.035 and extinction = 0.02) to get trees roughly matching the observed tree.
Note that you might want to consider more appropriate ways to calculate these rates for research projects (e.g @magallon2001absolute).
We also need a stopping rule for when to stop the simulations (in our case when reaching 140 time units).
This will produce a single random tree using the input parameters (Fig. 3).
The number of time units in treats
is arbitrary and is not equivalent to millions of years.
Using 140 time units here allows to simulate number of tips in a similar order of magnitude as the ones in the observed data.
Note that I will not discuss the options here in great details.
Much more information can be found in the treats
manual.
## Using the birth-death parameters from the observed tree my_bd_params <- make.bd.params(speciation = 0.035, extinction = 0.02) ## Setting the stopping rule (stop after 140 time units) stop_rule <- list(max.time = 140) ## Simulate the tree set.seed(2) sim_tree <- treats(bd.params = my_bd_params, stop.rule = stop_rule, null.error = 100, verbose = FALSE)
## Plotting the resulting tree plot(ladderize(sim_tree), show.tip.label = FALSE) ## Adding the phylogenetic axis axisPhylo()
For our specific question, we will also need to simulate some traits associated with each node and tip.
For simplicity we will simulate a two dimensional Brownian Motion trait.
To do so, we can create a "traits"
object with the function make.traits
.
This results in a 2 dimensional trait space for all the simulated species and their nodes (Fig. 4).
## Creating a trait in 2 dimensions. my_traits <- make.traits(process = BM.process, n = 2) ## Simulate the tree and traits set.seed(123) sim_data <- treats(traits = my_traits, bd.params = my_bd_params, stop.rule = stop_rule, null.error = 100, verbose = FALSE)
par(mfrow = c(1,2)) ## Plotting one trait through time plot(sim_data, ylab = "Trait 1", las = 1, main = "Trait 1 through time") ## Plotting the two dimensions against each other plot(sim_data, trait = c(2,1), ylab = "Trait 1", xlab = "Trait 2", las = 1, main = "Traits 1 and 2")
To answer our question, we also want to simulate an extinction event.
To do so, we can create two different "events"
object with the function make.events
.
The first one will simulate a random extinction after reaching 66 time units and then making three quarter (0.75) of the taxa go extinct.
The second one will simulate a random extinction but based on trait values: after reaching time 66, all the species with positive trait values will go extinct.
Both scenarios illustrate two different types of mass extinctions but they are not equivalent: because of the ancestral trait value starting at 0, we expect the second scenario to remove on average only 50% of the species (i.e. half the species are expected to evolve a trait value above 0).
For more details on simulating the effect of mass extinction and the difficulties to simulate an unambiguous effect of a mass extinction, see @puttick2020complex.
Because of this stochasticity of the simulations, we will repeat them 50 times (using replicates = 50
) to generate a distribution of possible simulated scenarios as opposed to a random single one that could be idiosyncratic (Fig. 5).
## Creating a random mass extinction random_extinction <- make.events( target = "taxa", condition = age.condition(140-66), modification = random.extinction(0.75)) ## Creating an extinction that removes species with positive trait values positive_extinction <- make.events( target = "taxa", condition = age.condition(140-66), modification = trait.extinction(x = 0, condition = `>=`)) set.seed(123) ## Simulate the tree and traits with a random extinction event sim_rand_extinction <- treats( traits = my_traits, bd.params = my_bd_params, stop.rule = stop_rule, events = random_extinction, null.error = 100, replicates = 50) ## Simulate the tree and traits with a selective extinction event sim_trait_extinction <- treats( traits = my_traits, bd.params = my_bd_params, stop.rule = stop_rule, events = positive_extinction, null.error = 100, replicates = 50)
par(mfrow = c(1,2)) ## Visualising the difference between both scenarios ## Random extinction plot(sim_rand_extinction[[42]], main = "Random extinction") abline(v = 66, col = "red") abline(h = 0, col = "grey") legend("topleft", pch = NULL, legend = "A", bty = "n", cex = 2) ## Selective extinction plot(sim_trait_extinction[[50]], main = "Positive extinction") abline(v = 66, col = "red") abline(h = 0, col = "grey") legend("topleft", pch = NULL, legend = "B", bty = "n", cex = 2)
Once we have simulated a distribution of trees and traits with the two extinction scenarios, we can measure disparity as in Fig. 2 for all the simulated data and compare it to the observed disparity.
## Remove single nodes simulated at the extinction sim_rand_extinction <- drop.singles(sim_rand_extinction) sim_trait_extinction <- drop.singles(sim_trait_extinction) ## Calculate the dispRity for all the simulations random_extinction_disparity <- dispRitreats(sim_rand_extinction, method = "continuous", model = "proximity", time = seq(from = 96, to = 36, by = -10), metric = c(sum, variances), scale.trees = FALSE) selective_extinction_disparity <- dispRitreats(sim_trait_extinction, method = "continuous", model = "proximity", time = seq(from = 96, to = 36, by = -10), metric = c(sum, variances), scale.trees = FALSE) ## Scale the disparity results (to compare to the observed ones) random_extinction_disparity <- scale.dispRity(random_extinction_disparity) selective_extinction_disparity <- scale.dispRity(selective_extinction_disparity) observed_disparity <- unlist(get.disparity(scale.dispRity(observed_disparity))) ## Plotting the results with the observed disparity par(mfrow = c(1,2)) plot(random_extinction_disparity, main = "Random extinction", ylab = "Scaled sum of variances", ylim = c(0, 1)) abline(v = 4, col = "red") lines(x = 1:7, y = observed_disparity, lty = 2, lwd = 2) plot(selective_extinction_disparity, main = "Selective extinction", ylab = "", ylim = c(0, 1)) abline(v = 4, col = "red") lines(x = 1:7, y = observed_disparity, lty = 2, lwd = 2)
From these results we can draw some preliminary conclusions: the observed change in disparity is more likely due to a selective mass extinction rather than a random one (Fig. 6). This is of course a very crude way of testing this, a more rigorous approach is needed to answer the question: more and better quality data, and more thorough methods (e.g. using a rank envelope test @murrell2018global).
The treats
package comes with internal documentation (e.g. ?treats
) but also with a thorough and extended vignette in a gitbook format: the treats
manual.
This manual is designed so that it can be regularly updated and enhanced through the lifetime of the package facilitating the interface between methods development and usage [@cooper2016dark].
Furthermore, a library of simulation templates is maintained on the GitHub page.
These templates are written and shared in the form of GitHub issues template and can be submitted and shared by any users.
Either for them to have them stored somewhere curated, or better yet, so that other users can reuse, comment and mofidy them for their own projects.
This paper describes the first version of the treats
package.
However, I intend to continuously develop this package.
For example future planned versions will include abiotic events and a better integration with the dispRity
package.
This will be done while keeping track change (through NEWS
file), continuous integration and unit testing.
This paper is entirely reproducible from an Rmarkdown document available on GitHub.
The data used for the example above [@beck2014ancient] is available from the dispRity
package [@dispRity].
The treats
package modular architecture allows workers to develop their own specific biological simulation scenarios based on their own specific research question.
The pipeline of the package through the different "treats"
objects ("traits"
, "modifiers"
and "events"
) also allows workers to generate publication standard results through plotting but also with easily reproducible and reusable scripts.
The treats
package is available on the CRAN or on GitHub with more associated information.
All the versions of the package are archived on ZENODO with associated DOI:10.5281/zenodo.7970384.
Many thanks to Mark Puttick for inspiring the development the package through previous collaborations. Thanks to Andrew Beckerman, Ian Brennan, Gustavo Burin, Christopher Cooney, Natalie Cooper, Alex Cranston, Jasmine Hardie, Tom Lansley, Clement Prieul, Joe Rees, James Rule, Sophie Ryan and Gavin Thomas, for support and useful comments on late stages of the development of this package. Thanks to William Gearty, one anonymous reviewer and one anonymous associate editor for their very useful suggestions. This work was funded by UKRI-NERC Grant NE/T000139/1 awarded to Gavin Thomas.
I declare no conflicts of interest.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.