In TGuillerme/dads: Trees and Traits Simulations

$^1$ School of Biosciences, University of Sheffield, Sheffield, S10 2TN, United Kingdom. guillert@tcd.ie

Running head: treats: TREes And Traits Simulations

Abstract

Simulating biological realistic data is an important step to understand and investigate biodiversity. Simulated data can be used to generate null, base line or neutral models. These can be used either in comparison to observed data to estimate the mechanisms that generated the data. Or they can be used to explore, understand and develop theoretical advances by proposing toy models.
In evolutionary biology, simulations often involve the need of an evolutionary process where descent with modification is at the core of how the simulated data is generated. These evolutionary processes can then be nearly infinitely modified to include complex processes that affect the simulations such as traits co-evolution, competition mechanisms or mass extinction events.
Here I present the treats package, a modular R package for trees and traits simulations. This package is based on a simple birth death algorithm from which all steps can easily be modified by users.
treats also provides a tidy interface through the treats object, allowing users to easily run reproducible simulations. It also comes with an extend manual regularly updated following users' questions or suggestions.

Keywords: trees, traits, simulations, birth-death, null-models, ecology, evolution, disparity

Introduction

Comparing biological patterns is one of the key ways to understand mechanisms in evolutionary biology. This leads to the development of phylogenetic comparative methods as key methodologically driven topic in ecology, evolution and palaeontology [@felsensteinPCM; @pennell2013review]. These methods rely on comparing patterns in a phylogenetic context to understand biological mechanisms or concepts [@harmon2019book]. These comparisons can be done between observed patterns under different conditions or against null, neutral or baseline models (see @bausman2018neutral for distinctions) suggesting different processes or mechanisms. For example different traits distribution for species with different diets [@deepak2023diet] or habitats [@pinto2017geographical]. Or by comparing some observed pattern to one simulated under null or base conditions [@miller2022alternating]. In theory, workers can use the following research pipeline: 1) thinking of a specific mechanism (e.g. mass extinction allowing the surviving species to acquire new morphologies), 2) collecting some data to test this mechanism (e.g. some traits of species across and extinction event) and then 3) comparing these patterns to some simulated under no specific conditions (e.g. a null model where the traits evolve randomly regardless of an extinction event @puttick2020complex). Workers might thus need to simulate a great diversity of evolutionary scenarios to test their specific question. To do so, we need statistical and software solutions to simulate trees and data to generate many specific null models.

In practice, these evolutionary simulations can be done relatively easily on computers using a birth-death process [@feller1939birthdeath; @stadler2010birthdeath; @diversitree]. A birth-death process is a continuous time Markov process that had been routinely implemented in R [@R] to simulate realistic phylogenies (e.g. [@ape], [@diversitree]). This general algorithm to generate phylogenetic trees can be coupled with other Markov processes to also generate traits, for example using a Brownian Motion process (BM; @cavalli1967BM) or an Ornstein Uhlenbeck (OU; @lande1976OU; see @cooper2016cautionary for a distinction between both). In R, this can be done with several already well used and well documented packages. For example if you want to simulate diversity through time, you can use TreeSim [@treesim] to simulate diversity under a set of specific parameters (e.g. speciation and extinction) with some events disrupting the simulations (e.g. mass extinctions). You can even improve on generating these patterns using FossilSim [@fossilsim] to take into account fossilization processes. You can also use paleobuddy [@paleobuddy] or paleotree [@paleotree] to generate palaeontology specific data. On the other hand, if you need to simulate both diversity and traits through time, this can be done with specific parameters in RPANDA [@rpanda], diversitree [@diversitree] or PETER [@puttick2020complex] where the traits are generated stochasticaly through time (given some process) during the birth-death process.

Although the packages mentioned above are excellent and routinely used with fast and reliable algorithms and associated documentation, they are all designed for specific tasks and don't allow much modification beyond the input parameters designed by the authors. For example, TreeSim can simulate a birth-death tree with some extinction event but is not designed to simulated one with an extinction event that leads to the birth-death process to be not diversity dependent anymore, simulating a release in selection pressure after the extinction event that leads to a different process dominating speciation. Or PETER is not designed to simulate a complex set of traits (say three correlated BM traits and two independent OU ones). This absence of modularity has hampered the use of complex and question-driven simulations, although I acknowledge this was not the primary aim of the authors of the excellent packages mentioned above. This has led workers to often develop their own tools to answer specific questions (e.g. @puttick2020complex). Therefore, I propose treats a modular R package to simulate both trees and traits through time. Note that although treats is modular and thus allows to be used as go to tool for simulating and trees and traits, it lacks the ready-to-use implemented methods featured in other packages such as fossilisation and sampling [@fossilsim; @treesim; @paleobuddy] or specific macroevolutionary simulations [@rpanda; @puttick2020complex].

Description

treats is based on the eponymous treats function that allows to simulate a phylogeny and some trait(s) simultaneously. The base birth-death algorithm "grows" a phylogenetic tree and generates traits for each node and tips in the following manner:

Generating branch length;
Selecting a lineage among the currently living ones;
Choosing whether that lineage goes extinct (becomes a tip) or speciates (becomes a node).

These three steps are repeated until the tree reaches the desired age or number of species (the algorithm's implementation is heavily inspired and based on @diversitree). If traits are simulated during the process, a fourth step is added:

Generating some trait(s) value(s) for the selected lineage (either a tip or a node - but see the "Simulating traits section" below to generate trait values along edges).

In treats, these three or four steps are implemented as modular functions that the user can easily change using an internal class of objects called "modifiers" or "traits" (Fig. 1). The simulation then outputs a tree (of class "phylo" and a associated table of traits - "matrix") that can be visualised using the plot.treats function. A third class of object called "events" can be added to the simulations to modify the entire simulation under certain conditions (e.g. simulating a mass extinction). Each element in the algorithm can be modified by the user using the implemented functions make.bd.params to set the birth-death parameters, make.traits to set the trait(s), make.modifiers to set the birth-death algorithm and make.events to eventually add one or more events. Users can replace any step in the algorithm by their own specific functions suiting their needs or, for less advanced users, by using already implemented functions.

{#figure1}

Major functionalities

The following sections provides an overview of the three main functions displayed in Figure 1 (make.traits, make.modifiers and make.events).

Simulating traits

Traits are simulated via the make.traits function given one or more processes, number of dimensions per process and some starting value(s). Essentially, the generation of new trait values is based on a process (a function) modifying a trait value (x0) relative to some branch length (edge.length). For example, a simple Brownian Motion process could be generated by the function rnorm where it modifies the trait value x0 relative to the branch length (edge.length). The longer the branch length, the more likely the new trait value will be different from x0:

## Brownian Motion process
my.bm.process <- function(x0, edge.length) {
    rnorm(n = 1, mean = x0, sd = edge.length)
} 
## Creating the traits object
my.trait <- make.traits(process = my.bm.process)

Of course biological data can be much more complex and often multivariate [@adams2019multivarPCM] requiring the users to develop more complex function to cater their specific needs. The treats package contains a list of pre-built processes that are "ready-to-use":

BM.process and OU.process: generalised Brownian Motion and Ornstein-Uhlenbeck processes in any number of dimensions (including possible correlation);
multi.peak.process: a generalised Ornstein-Uhlenbeck process (uni- or multi-dimensional) which allows for multiple peaks;
repulsion.process: a unidimensional process generating trait values that don't overlap with previously generated trait values;
discrete.process: generates discrete trait values;
no.process: ignores the branch length (i.e. a time independent process);

Traits are always generated (and stored) only for tips or nodes and not along edges. However, it is possible so generate trait values at specific or regular time intervals by using the option save.steps in the treats function. This will generate singleton nodes (i.e. nodes with only one descendant) with associated trait values.

Note that the treats package primary aim is to generate both a tree and some traits at the same time. However, it is possible to also just generate traits with a given topology. This is done through the function map.traits that intakes one or more trees and a "traits" object.

Modifying the birth-death process

Modifying the birth-death process can be done in several ways. Most easily it is done through changing the stopping rules through the stop.rule argument (number of total taxa or living ones, or time of the simulation). Equally straightforward, one can modify the parameters of the birth-death process through the make.bd.params function: the speciation ($\lambda$) and the extinction ($\mu$) ones. These can be either fixed values (for constant speciation and extinction) or values drawn from distributions.

It is also possible to directly modify how the birth-death algorithm works through make.modifiers by changing the three main components of the birth-death algorithm as described above. By default, the algorithm uses the following algorithms:

Generating branch length by drawing a value from an exponential distribution with the rate being function of the current number of lineages scaled by the speciation and extinction parameters

## The default branch length generation
rexp(1, rate = number_of_lineages * (speciation + extinction))

Selecting a lineage among the currently living ones by simply sampling across the available (living) lineages:

## The default lineage selection
sample(number_of_lineages, 1)

Choosing whether that lineage goes extinct (becomes a tip) or speciates (becomes a node) by drawing a random number between 0 and 1 and comparing it to the ratio of speciation and turnover (speciation + extinction). If the random number is smaller than the ratio of speciation and turnover, the lineage speciates, else it goes extinct:

## The default speciation/extinction decider
runif(1) <= (speciation / (speciation + extinction))

The function make.modifiers allows to specifically change any of these components by providing a different function for each part of the algorithm. For example, one can modify the three functions above so that the branch length is not dependent on the number of lineages, the sampling is always the first species available and the extinction is drawn from a normal distribution. Note that these function need a specific syntax that is detailed in the treats manual.

## Lineage independent waiting:
lineage.independent <- function(bd.params,
                                lineage = NULL,
                                trait.values = NULL,
                                modify.fun = NULL) {
    my_rate <- bd.params$speciation + bd.params$extinction
    return(rexp(1, rate = my_rate))
}
## Selecting always the first species
select.first <- function(bd.params,
                         lineage = NULL,
                         trait.values = NULL,
                         modify.fun = NULL) {
    return(as.integer(1))
}
## Random normal speciation
normal.speciation <- function(bd.params,
                              lineage = NULL,
                              trait.values = NULL,
                              modify.fun = NULL) {
    my_turnover <- bd.params$speciation/
                  (bd.params$speciation + bd.params$extinction)
    return(rnorm(1) <= my_turnover)
}

## Creating the modifier object
modified.birth.death <- make.modifiers(branch.length = lineage.independent,
                                       selection     = select.first,
                                       speciation    = normal.speciation)

Creating events

The final major argument to be passed to treats are the "events" objects generated through make.events where the following information needs to be specified:

the target designating what the event should be applied to (e.g. "taxa" for modifying the number of species, "traits" for modifying the traits, etc.)
the condition which is a function returning a logical value of when to trigger the event (e.g. when reaching a certain number of taxa, after some specific time has ellapsed or if some specific trait value is reached, etc.)
the modification which is a function that specifically modifies an internal object in the treats algorithm.

For a more exhaustive list of events so you can refer to the treats manual with many different detailed examples. Briefly though, this is how the mass extinction events are designed in the example above.

## Creating an extinction that removes species with positive trait values
positive_extinction <- make.events(
    target       = "taxa",
    condition    = age.condition(15),
    modification = trait.extinction(x = 0, condition = `>=`))

For this event, the target is the number of taxa in the simulations. This is indicated using the target = "taxa" argument. Then the event is triggered using the argument age.condition(15) and modifies the internal lineage object using the trait.extinction(x = 0, condition = `>=`) function. age.condition and trait.extinction are both function factories which are not going to be detailed here (see @wickham2019advanced). Effectively these arguments can be passed directly as standard functions. For example, to trigger the event when reaching time 15 we can use the following function:

## Returns TRUE when reaching time 15
reaching.time15 <- function(bd.params, lineage, trait.values, time) {
    return(time > 15)
}

This will trigger the modification which is equivalent to the following function modifying the lineage internal list:

removing.75.taxa <- function(bd.params, lineage, trait.values) {
    ## Select a portion of the living species to go extinct
    extinct <- sample(lineage$n, round(lineage$n * 0.75))

    ## Update the lineage object
    lineage$livings <- lineage$livings[-extinct]
    lineage$n       <- lineage$n - length(extinct)
    return(lineage)
}

Hence, the extinction event described above is equivalent to the following one:

## Creating an extinction that removes species with positive trait values
positive_extinction <- make.events(
    target       = "taxa",
    condition    = reaching.time15,
    modification = removing.75.taxa)

Visualising results

The treats package also comes with tools to visualise trees and traits together or separately. This can be done through the generic S3 plot function (calling plot.treats) and allows to display up to three traits or two traits and time (in 3D) as displayed in Figures 4 and 5. These functions can also be used to visualise trees and traits together from non "treats" objects by using the make.treats function to transform them into "treats" objects.

Brief applied example

To illustrate this we will look whether it is possible to detect changes in disparity (i.e. diversity of traits) in a subset of the data published from @beck2014ancient implemented in @dispRity. This dataset contains the ordinated traits for 50 mammalian species across the Cretaceous-Palaeogene extinction event (K-Pg, 66 Mya; Fig. 2).

## Loading the package and data
library(treats)
data(BeckLee_tree)
data(BeckLee_mat99)

## Creating the time slices
time_slices <- chrono.subsets(BeckLee_mat99, BeckLee_tree,
                              method = "continuous",
                              model  = "proximity",
                              time   = seq(from = 96, to = 36, by = -10))

## Calculating disparity on the two first dimensions only
observed_disparity <- dispRity(boot.matrix(time_slices),
                               metric = c(sum, variances),
                               dimensions = c(1,2))

## Plotting the tree and the disparity through time
par(mfrow = c(1,2), bty = "null")
## Thre tree
plot(ladderize(BeckLee_tree), show.tip.label = FALSE)
axisPhylo()
abline(v = BeckLee_tree$root.time - 66, col = "red")
legend("topleft", pch = NULL, legend = "A", bty = "n", cex = 2)

## The disparity
plot(observed_disparity, ylab = "Sum of variances")
abline(v = 4, col = "red")
legend("topleft", pch = NULL, legend = "B", bty = "n", cex = 2)

Using this example dataset, one might be interested in testing whether the K-Pg extinction had an effect on disparity through time. But can such effect be detected in the first place? We can test this by simulating some datasets with similar properties as the observed data and measure changes in disparity in these simulated datasets.

Simulating trees

The first and simplest way is to simulate tree topologies that have similar properties than the observed one. To do so, we need to use some speciation parameter indicating the rate at which lineages speciate (aka "birth" or "$\lambda$") and an extinction parameter indicating the rate at which they go extinct (aka "death" or "$\mu$"). Here we are using two relatively arbitrary (speciation = 0.035 and extinction = 0.02) to get trees roughly matching the observed tree. Note that you might want to consider more appropriate ways to calculate these rates for research projects (e.g @magallon2001absolute). We also need a stopping rule for when to stop the simulations (in our case when reaching 140 time units). This will produce a single random tree using the input parameters (Fig. 3). The number of time units in treats is arbitrary and is not equivalent to millions of years. Using 140 time units here allows to simulate number of tips in a similar order of magnitude as the ones in the observed data. Note that I will not discuss the options here in great details. Much more information can be found in the treats manual.

## Using the birth-death parameters from the observed tree
my_bd_params <- make.bd.params(speciation = 0.035, extinction = 0.02)
## Setting the stopping rule (stop after 140 time units)
stop_rule <- list(max.time = 140)

## Simulate the tree
set.seed(2)
sim_tree <- treats(bd.params  = my_bd_params,
                   stop.rule  = stop_rule,
                   null.error = 100,
                   verbose    = FALSE)

## Plotting the resulting tree
plot(ladderize(sim_tree), show.tip.label = FALSE)
## Adding the phylogenetic axis
axisPhylo()

Simulating trees and traits

For our specific question, we will also need to simulate some traits associated with each node and tip. For simplicity we will simulate a two dimensional Brownian Motion trait. To do so, we can create a "traits" object with the function make.traits. This results in a 2 dimensional trait space for all the simulated species and their nodes (Fig. 4).

## Creating a trait in 2 dimensions.
my_traits <- make.traits(process = BM.process, n = 2)

## Simulate the tree and traits
set.seed(123)
sim_data <- treats(traits     = my_traits,
                   bd.params  = my_bd_params,
                   stop.rule  = stop_rule,
                   null.error = 100,
                   verbose    = FALSE)

par(mfrow = c(1,2))
## Plotting one trait through time
plot(sim_data, ylab = "Trait 1", las = 1, main = "Trait 1 through time")

## Plotting the two dimensions against each other
plot(sim_data, trait = c(2,1), ylab = "Trait 1", xlab = "Trait 2", las = 1, main = "Traits 1 and 2")

Simulating a trees and traits with events

To answer our question, we also want to simulate an extinction event. To do so, we can create two different "events" object with the function make.events. The first one will simulate a random extinction after reaching 66 time units and then making three quarter (0.75) of the taxa go extinct. The second one will simulate a random extinction but based on trait values: after reaching time 66, all the species with positive trait values will go extinct. Both scenarios illustrate two different types of mass extinctions but they are not equivalent: because of the ancestral trait value starting at 0, we expect the second scenario to remove on average only 50% of the species (i.e. half the species are expected to evolve a trait value above 0). For more details on simulating the effect of mass extinction and the difficulties to simulate an unambiguous effect of a mass extinction, see @puttick2020complex. Because of this stochasticity of the simulations, we will repeat them 50 times (using replicates = 50) to generate a distribution of possible simulated scenarios as opposed to a random single one that could be idiosyncratic (Fig. 5).

## Creating a random mass extinction
random_extinction <- make.events(
    target       = "taxa",
    condition    = age.condition(140-66),
    modification = random.extinction(0.75))
## Creating an extinction that removes species with positive trait values
positive_extinction <- make.events(
    target = "taxa",
    condition = age.condition(140-66),
    modification = trait.extinction(x = 0, condition = `>=`))

set.seed(123)
## Simulate the tree and traits with a random extinction event
sim_rand_extinction <- treats(
                   traits     = my_traits,
                   bd.params  = my_bd_params,
                   stop.rule  = stop_rule,
                   events     = random_extinction,
                   null.error = 100,
                   replicates = 50)

## Simulate the tree and traits with a selective extinction event
sim_trait_extinction <- treats(
                   traits     = my_traits,
                   bd.params  = my_bd_params,
                   stop.rule  = stop_rule,
                   events     = positive_extinction,
                   null.error = 100,
                   replicates = 50)

par(mfrow = c(1,2))
## Visualising the difference between both scenarios
## Random extinction
plot(sim_rand_extinction[[42]], main = "Random extinction")
abline(v = 66, col = "red")
abline(h = 0, col = "grey")
legend("topleft", pch = NULL, legend = "A", bty = "n", cex = 2)
## Selective extinction
plot(sim_trait_extinction[[50]], main = "Positive extinction")
abline(v = 66, col = "red")
abline(h = 0, col = "grey")
legend("topleft", pch = NULL, legend = "B", bty = "n", cex = 2)

Once we have simulated a distribution of trees and traits with the two extinction scenarios, we can measure disparity as in Fig. 2 for all the simulated data and compare it to the observed disparity.

## Remove single nodes simulated at the extinction
sim_rand_extinction <- drop.singles(sim_rand_extinction)
sim_trait_extinction <- drop.singles(sim_trait_extinction)

## Calculate the dispRity for all the simulations
random_extinction_disparity <- dispRitreats(sim_rand_extinction,
                                            method = "continuous",
                                            model  = "proximity",
                                            time   = seq(from = 96, to = 36, by = -10),
                                            metric = c(sum, variances),
                                            scale.trees = FALSE)
selective_extinction_disparity <- dispRitreats(sim_trait_extinction,
                                            method = "continuous",
                                            model  = "proximity",
                                            time   = seq(from = 96, to = 36, by = -10),
                                            metric = c(sum, variances),
                                            scale.trees = FALSE)

## Scale the disparity results (to compare to the observed ones)
random_extinction_disparity <- scale.dispRity(random_extinction_disparity)
selective_extinction_disparity <- scale.dispRity(selective_extinction_disparity)
observed_disparity <- unlist(get.disparity(scale.dispRity(observed_disparity)))

## Plotting the results with the observed disparity
par(mfrow = c(1,2))
plot(random_extinction_disparity, main = "Random extinction",
     ylab = "Scaled sum of variances", ylim = c(0, 1))
abline(v = 4, col = "red")
lines(x = 1:7, y = observed_disparity, lty = 2, lwd = 2)
plot(selective_extinction_disparity, main = "Selective extinction",
     ylab = "", ylim = c(0, 1))
abline(v = 4, col = "red")
lines(x = 1:7, y = observed_disparity, lty = 2, lwd = 2)

From these results we can draw some preliminary conclusions: the observed change in disparity is more likely due to a selective mass extinction rather than a random one (Fig. 6). This is of course a very crude way of testing this, a more rigorous approach is needed to answer the question: more and better quality data, and more thorough methods (e.g. using a rank envelope test @murrell2018global).

Additional information

Manuals, vignette and templates

The treats package comes with internal documentation (e.g. ?treats) but also with a thorough and extended vignette in a gitbook format: the treats manual. This manual is designed so that it can be regularly updated and enhanced through the lifetime of the package facilitating the interface between methods development and usage [@cooper2016dark]. Furthermore, a library of simulation templates is maintained on the GitHub page. These templates are written and shared in the form of GitHub issues template and can be submitted and shared by any users. Either for them to have them stored somewhere curated, or better yet, so that other users can reuse, comment and mofidy them for their own projects.

Further directions

This paper describes the first version of the treats package. However, I intend to continuously develop this package. For example future planned versions will include abiotic events and a better integration with the dispRity package. This will be done while keeping track change (through NEWS file), continuous integration and unit testing.

Repeatability and reproducibility

This paper is entirely reproducible from an Rmarkdown document available on GitHub. The data used for the example above [@beck2014ancient] is available from the dispRity package [@dispRity].

Conclusion

The treats package modular architecture allows workers to develop their own specific biological simulation scenarios based on their own specific research question. The pipeline of the package through the different "treats" objects ("traits", "modifiers" and "events") also allows workers to generate publication standard results through plotting but also with easily reproducible and reusable scripts.

Package location

The treats package is available on the CRAN or on GitHub with more associated information. All the versions of the package are archived on ZENODO with associated DOI:10.5281/zenodo.7970384.

Acknowledgments

Many thanks to Mark Puttick for inspiring the development the package through previous collaborations. Thanks to Andrew Beckerman, Ian Brennan, Gustavo Burin, Christopher Cooney, Natalie Cooper, Alex Cranston, Jasmine Hardie, Tom Lansley, Clement Prieul, Joe Rees, James Rule, Sophie Ryan and Gavin Thomas, for support and useful comments on late stages of the development of this package. Thanks to William Gearty, one anonymous reviewer and one anonymous associate editor for their very useful suggestions. This work was funded by UKRI-NERC Grant NE/T000139/1 awarded to Gavin Thomas.

Conflict of interest statement

I declare no conflicts of interest.

References

TGuillerme/dads documentation built on July 16, 2025, 9:14 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

TGuillerme/dads
Trees and Traits Simulations

In TGuillerme/dads: Trees and Traits Simulations

Abstract

Keywords: trees, traits, simulations, birth-death, null-models, ecology, evolution, disparity

Introduction

Description

Major functionalities

Simulating traits

Modifying the birth-death process

Creating events

Visualising results

Brief applied example

Simulating trees

Simulating trees and traits

Simulating a trees and traits with events

Additional information

Manuals, vignette and templates

Further directions

Repeatability and reproducibility

Conclusion

Package location

Acknowledgments

Conflict of interest statement

References

R Package Documentation

Browse R Packages

We want your feedback!

TGuillerme/dads Trees and Traits Simulations

In TGuillerme/dads: Trees and Traits Simulations

Abstract

Keywords: trees, traits, simulations, birth-death, null-models, ecology, evolution, disparity

Introduction

Description

Major functionalities

Simulating traits

Modifying the birth-death process

Creating events

Visualising results

Brief applied example

Simulating trees

Simulating trees and traits

Simulating a trees and traits with events

Additional information

Manuals, vignette and templates

Further directions

Repeatability and reproducibility

Conclusion

Package location

Acknowledgments

Conflict of interest statement

References

R Package Documentation

Browse R Packages

We want your feedback!

TGuillerme/dads
Trees and Traits Simulations