DAISIErobustness is an R package for testing of the robustness of the island biogeography model "DAISIE" (Dynamical Assembly of Islands by Speciation, Immigration and Extinction) to more complex and potentially more realistic evolutionary models. Different measures of error of number of species, endemics, non-endemics and evolutionary trajectories are used to determine whether the alternative models can influence the inference capabilities of the current DAISIE model.
DAISIE is an evolutionary island biogeography model that allows for the estimation of diversification rates and other relevant diversification parameters on islands using phylogenetic data. These are: Cladogenesis rate Anagenesis rate (Clade-specific or island-wide) carrying capacity Migration rate * Extinction rate
The estimation of such parameters is achieved using Maximum-Likelihood optimization, following the DAISIE likelihood functions described in published literature1. The performance of DAISIE inference given the amount and quality of data has been studied as well. It was shown to perform well, particularly when phylogenetic data is available2.
Furthermore, DAISIE includes simulation code allowing for the simulation of data following the above mentioned parameters.
DAISIErobustness consists of a pipeline designed to measure the error one creates when extending the standard DAISIE model with new features. Examples of such new additions include the modelling of island ontogeny, as per the General Dynamic Model3, sea level changes4, and non-oceanic scenarios. The error measure is obtained by simulating and comparing DAISIE data using simulation code that builds upon the existing DAISIE simulations by including geodynamic processes.
run_robustness()
with the argument pipeline = novel_sim
in DAISIErobustness or DAISIE_sim()
in the DAISIE package.The already available models can easily be run by calling the main function run_robustness()
. The parameter space and models this function can accept are stored in the data folder, and were generated by running the generate_param_space.R
script. The available parameter spaces are:
Continental: continental
Continental with land bridges: continental_land_bridge
Oceanic ontogeny: oceanic_ontogeny
Oceanic ontogeny with sea-level changes: oceanic_ontogeny_sea_level
Oceanic with sea-level changes: oceanic_sea_level
Trait dependency with CES(colonization,extinction and speciation) rates changes: trait_CES
The codes in mono-spaced font serve as arguments for the run_robustness()
function. Then, the corresponding csv parameter space is read from the GitHub repository to the function scope, so that the pipeline can begin.
The currently implemented DAISIE parameter sets are stored in the folder mentioned in the previous section. The easiest way to run additional parameter sets using the current geodynamics simulations is to fork this repository and change or upload new files to the /data
folder. Do note that if this is done, the load_param_space()
function should be changed so that the domain URL reflects the user's fork.
An example of an edited load_param_space()
function to run of a fork owned by joshwlambert:
load_param_space <- function(param_space_name) {
file_domain <-
"https://raw.githubusercontent.com/joshwlambert/DAISIErobustness/master/data/"
file <- paste0(file_domain, param_space_name, ".csv")
param_space <- readr::read_csv2(
file = file
)
return(param_space)
}
load_param_space()
should now read the correct files from the folder in the fork at the joshwlambert account.
TBC, maybe break into another page
run_robustness(
param_space_name = "oceanic_ontogeny",
param_set = 1,
replicates = 10,
save_output = TRUE
)
This code will start the pipeline for the first parameter set in the oceanic ontogeny parameter space. The first parameter set corresponds to the first line in the matching csv file. 10 oceanic ontogeny repicates will run.
When save_output = TRUE
, all the objects generated by the pipeline will be stored in the package's root folder, into /results/param_space_name
, param_space_name
corresponding to parameter spaced given when the function is called. At the moment, if saving is desired, these folders must be present in the system beforehand!
If save_output = FALSE
, then the objects will be returned by the function, allowing them to be saved to an R object and handled in an interactive session.
The parameter currently implemented can be found here. New parameter sets can be generated using this this helpful script.
The following results are used to determine the error between models: The nLTT5 statistic for endemic species, non-endemic species and all species. The difference at the end of the simulation of the number of species, endemic and nonendemic species.
These metrics are then aggregated between all replicates of a given parameter space in the following way: Mean and standard deviation in the difference of all nLTTs Mean and standard deviation of number of species, endemics and nonendemics
Given the stochastic nature of the simulation models, and that given the very nature of these studies the properties of the simulated output are not known, some constraints must be made on the simulated data and likelihood estimates. When the data generated by a simulations of a certain parameter space does not respect the constraints, these data are saved (to the degree they are generated) but not analysed.
To ensure that appropriate data is simulated by the model in study, i.e. the data has enough phylogenetic information but is not so large as to become unwieldy and unrealistic simulations are constrained in total number of species and total number of colonising lineages. These constraints are computed by sim_constraints()
.
The currently implemented constraints for the simulations are: Proportion of replicates with 15 or more species must be > 95% Proportion of replicates with 5 or more colonizations (independent lineages) must be > 95% * Proportion of replicates with 100 or less species must be < 95%
As occasionally the MLE routine may crash or not converge, we also restrict the the pipeline on the number of successful parameter estimations. These constraints are checked by ml_constraints()
. A parameter set will be skipped if any of the MLE runs crashed or failed to converge.
This package contains code that interfaces directly with the Peregrine HPCC available for use by researchers and students at the University of Groningen. To make use of such functionality, it is required to have a University of Groningen account, with access to Peregrine. The interface was developed and implemented by using Giovanni Laudanno's (@Giappo) jap package. As such, this package is required for using this functionality. Please refer to the package's homepage for more documentation, and contact any of DAISIErobustness' authors for help setting this up on your end, assuming you have Peregrine access.
1Valente, Luis M., Albert B. Phillimore, and Rampal S. Etienne. "Equilibrium and non‐equilibrium dynamics simultaneously operate in the Galápagos islands." Ecology letters 18.8 (2015): 844-852.
2Valente, Luis, Albert B. Phillimore, and Rampal S. Etienne. "Using molecular phylogenies in island biogeography: it’s about time." Ecography 182 (2018): 820.
3Whittaker, Robert J., Kostas A. Triantis, and Richard J. Ladle. "A general dynamic theory of oceanic island biogeography." Journal of Biogeography 35.6 (2008): 977-994.
4Fernández‐Palacios, José María, et al. "Towards a glacial‐sensitive model of island biogeography." Global Ecology and Biogeography 25.7 (2016): 817-830.
5Janzen, Thijs, Sebastian Höhna, and Rampal S. Etienne. "Approximate Bayesian Computation of diversification rates from molecular phylogenies: introducing a new efficient summary statistic, the nLTT." Methods in Ecology and Evolution 6.5 (2015): 566-575.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.