README.md

DDvTDtools

Branch|Travis|Codecov ---|---|--- master|Build Status|codecov.io

An R package to compare diversity-dependent (DD) and time-dependent (TD) models of diversification.

How to use DDvTDtools

DDvTDtools is an R package accompanying the manuscript:

Pannetier, T., Martinez, C., Bunnefeld, L., and Etienne, R.S. Branching patterns in phylogenies cannot distinguish diversity-dependent diversification from time-dependent diversification. Submitted

The package is a collection of functions and objects that were used to generate the data and figures presented in the article.

Note that the simulations and maximum likelihood optimisations call upon the integration of the DD master system introduced in Etienne et al. (2012), and this can take an awful lot of time. This is specially true for parameter settings with a long age (larger trees) and with extinction, and the computation is heavier for the time-dependent model. We performed these over 1000 trees each time, so if you only want to try the neat functions of the package (thanks for the interest!) you could run the functions introduced below for just a handful of trees.

Package installation

If you haven't done so already, install the DDvTDtools package from Github and the latest CRAN release of DDD.

``` {.sourceCode .r} install.packages("DDD") devtools::install_github("TheoPannetier/DDvTDtools")


Set up directory structure
--------------------------

The package works from a specific relative path, and most functions
expect to access data located in pre-defined directories. For example,
`DDvTDtools::read_sim()` expects to find the simulation output in
`DDvTD/data/sim`. The command below will set up the required directory
structure in your working directory.


``` {.sourceCode .r}
DDvTDtools::set_DDvTD_dir_struct()

Step 1 - Simulate phylogenetic trees

Diversity-dependent and time-dependent trees can be simulated with DDvTDtools::run_sim(), which is simply a wrapper around DDD::dd_sim() and DDD::td_sim() that standardizes the input and formats the output.

``` {.sourceCode .r} DDvTDtools::run_sim( sim = "DD", para = 1211, nb_trees = 1000 )


The first argument, `sim` refers to the name of the simulating model,
either `"DD"` or `"TD"`. The second argument, `para`, is a four-digit
number that codes the values of the four parameters (crown age, baseline
speciation rate, extinction rate and carrying capacity) used in the
study. `1211` for example means

-   crown age = 5 myr
-   speciation rate = 0.8
-   extinction rate = 0,
-   carrying capacity = 40

(This is the fastest setting to simulate and run maximum likelihood
optimisation on.)

Calling `arg_para()` will print the different values used through the
study.

``` {.sourceCode .r}
DDvTDtools::arg_para()
#> [1] 1211 1241 2211 2241 3211 3241 4211 4241

And what values they code can be found in the documentation

``` {.sourceCode .r} ?DDvTDtools::arg_para()


It is also possible to directly translate the code into proper values:

``` {.sourceCode .r}
DDvTDtools::para_to_pars(1211)
#> crown_age   lambda0       mu0         K 
#>       5.0       0.8       0.0      40.0

Simulated trees are stored in DDvTD/data/sim, and can be loaded with read_sim()

``` {.sourceCode .r} trees <- DDvTDtools::read_sim(sim = "DD", para = 1211) trees[[1]][[1]]

>

> Phylogenetic tree with 32 tips and 31 internal nodes.

>

> Tip labels:

> t1, t29, t28, t14, t5, t12, ...

> Node labels:

> I31, I10, I3, I2, I1, I9, ...

>

> Rooted; includes branch lengths.


Step 2 - Fit the models on simulated trees
------------------------------------------

Once phylogenetic trees have been simulated, each model should be fitted
to each type of trees. This is done by `DDD::dd_ML()` and
`DDD::bd_ML()`, but again, we have used a wrapper to standardise the
input and output.

``` {.sourceCode .r}
DDvTDtools::run_optim(
  sim = "DD", 
  para = 1211, 
  optim = "DD"
)
DDvTDtools::run_optim(
  sim = "DD", 
  para = 1211, 
  optim = "TD"
)

Argument optim is equivalent to argument sim introduced above. run_optim() will fetch the simulated trees from DDvTD/data/sim and save its output in DDvTD/data/optim. You can load these tables with

``` {.sourceCode .r} df <- DDvTDtools::read_optim_results( sim = "DD", para = 1211, optim = "TD", init_k = "true_k" ) head(df)

> sim ntips crown_age true_lambda0 true_mu0 true_K mc optim init_lambda0

> 1 DD 32 5 0.8 0 40 1 TD 0.8

> 2 DD 2 5 0.8 0 40 2 TD 0.8

> 3 DD 25 5 0.8 0 40 3 TD 0.8

> 4 DD 24 5 0.8 0 40 4 TD 0.8

> 5 DD 25 5 0.8 0 40 5 TD 0.8

> 6 DD 13 5 0.8 0 40 6 TD 0.8

> init_mu0 init_K loglik AIC lambda0_ML mu0_ML K_ML

> 1 0 40 -56.88745 119.77490 4.4302438 3.289253e-01 28.698403

> 2 0 40 0.00000 6.00000 3.3358413 1.282290e-19 1.863737

> 3 0 40 -41.37938 88.75875 1.2178454 2.230156e-01 29.258474

> 4 0 40 -39.21540 84.43081 0.7200940 5.308302e-10 53.757649

> 5 0 40 -40.47115 86.94230 1.0693765 1.881130e-09 55.000372

> 6 0 40 -17.13625 40.27250 0.9201438 8.264041e-01 899.988625

> hasConverged numCycles methode optimmethod jobID

> 1 TRUE 1 ode45 subplex 5018656

> 2 TRUE Inf ode45 subplex 4929384

> 3 TRUE Inf ode45 subplex 4929384

> 4 TRUE Inf ode45 subplex 4929384

> 5 TRUE Inf ode45 subplex 4929384

> 6 TRUE 5 ode45 subplex 5073785


The structure of a results data frame with a description of all the
variables can be found in the documentation of
`DDvTDtools::results_optim_struct()`. `init_k` denotes which value of
parameter K was used to initialise the maximum likelihood optimisation.
The default value, `true_k`, means the true value, i.e. 40. The
alternative is `from_n`, where K was instead set to the number of tips
in the tree. This is for some settings which produced large trees, for
which the likelihood optimisation proved particularly tedious. The
values of `init_k` used for each setting can be called with:

``` {.sourceCode .r}
DDvTDtools::get_init_k()
#>     1211     1241     2211     2241     3211     3241     4211     4241 
#> "true_k" "true_k" "true_k" "true_k" "from_n" "true_k" "from_n" "from_n"

Optimisation using init_k = from_n is run with another function, which calls run_optim()

``` {.sourceCode .r} DDvTDtools::run_optim_from_n( sim = "DD", para = 1211, optim = "TD" )


### Step 3 - Plot the results

Fig. 1 was produced from randomly sampled numbers, and I am not
reporting it here.

The plots below draw results from both models fit to both type of trees,
and so require that the four combinations of `sim` and `optim` for a
single value of `para` are present in `DDvTD\data\optim`.

**Fig. 2 - Average lineage-through-time plots**

``` {.sourceCode .r}
DDvTDtools::plot_ltt_nested(para = 1211)
#> Registered S3 method overwritten by 'geiger':
#>   method            from
#>   unique.multiPhylo ape

Fig. 3 - log-likelihood ratio distributions

{.sourceCode .r} DDvTDtools::plot_lr(para = 1211)



TheoPannetier/DDvTDtools documentation built on Oct. 22, 2020, 2:31 p.m.