| prepare_diversification_data | R Documentation |
Run a full BAMM (Bayesian Analysis of Macroevolutionary Mixtures) workflow
to produce a BAMM_object that contains a phylogenetic tree and associated diversification rates
mapped along branches, across selected posterior samples:
Step 1: Set BAMM - Record BAMM settings and generate all input files needed for BAMM.
Step 2: Run BAMM - Run BAMM and move output files in dedicated directory.
Step 3: Evaluate BAMM - Produce evaluation plots and ESS data.
Step 4: Import BAMM outputs - Load BAMM_object in R and subset posterior samples.
Step 5: Clean BAMM files - Remove files generated during the BAMM run.
The BAMM_object output is typically used as input to run deepSTRAPP with run_deepSTRAPP_for_focal_time()
or run_deepSTRAPP_over_time(). Diversification rates and regimes shift can be visualized with plot_BAMM_rates().
BAMM is a model of diversification for time-calibrated phylogenies that explores complex diversification dynamics by allowing multiple regime shifts across clades without a priori hypotheses on the location of such shifts. It uses reversible jump Markov chain Monte Carlo (rjMCMC) to automatically explore a vast range of models with different speciation and extinction rates, and different number and location of regime shits.
This function will work only if you have the BAMM C++ program installed in your machine.
See the BAMM website: http://bamm-project.org/ and the companion R package {BAMMtools}.
prepare_diversification_data(
BAMM_install_directory_path,
phylo,
prefix_for_files = NULL,
seed = NULL,
numberOfGenerations = 10^7,
globalSamplingFraction = 1,
sampleProbsFilename = NULL,
expectedNumberOfShifts = NULL,
eventDataWriteFreq = NULL,
burn_in = 0.25,
nb_posterior_samples = 1000,
additional_BAMM_settings = list(),
BAMM_output_directory_path = NULL,
keep_BAMM_outputs = TRUE,
MAP_odd_ratio_threshold = 5,
skip_evaluations = FALSE,
plot_evaluations = TRUE,
save_evaluations = TRUE
)
BAMM_install_directory_path |
Character string. The path to the directory where BAMM is. Use '/' to separate directory and sub-directories. The path must end with '/'. |
phylo |
Time-calibrated phylogeny. Object of class |
prefix_for_files |
Character string. Prefix to add to all BAMM files stored in the |
seed |
Integer. Set the seed to ensure reproducibility. Default is |
numberOfGenerations |
Integer. Number of steps in the MCMC run. It should be set high enough to reach the equilibrium distribution
and allows posterior samples to be uncorrelated. Check the Effective Sample Size of parameters with coda::effectiveSize() in the Evaluation step.
Default value is |
globalSamplingFraction |
Numerical. Global sampling fraction representing the overall proportion of terminals in the phylogeny compared to
the estimated overall richness in the clade. It acts as a multiplier on the rates needed to achieve such extant diversity.
Default is |
sampleProbsFilename |
Character string. The path to the |
expectedNumberOfShifts |
Integer. Set the expected number of regime shifts. It acts as an hyperparameter controlling the exponential prior distribution
used to modulate reversible jumps across model configurations in the rjMCMC run.
If set to |
eventDataWriteFreq |
Integer. Set the frequency in which to write the event data to the output file = the sampling frequency of posterior samples.
If set to |
burn_in |
Numerical. Proportion of posterior samples removed from the BAMM output to ensure that the remaining samples where drawn once the equilibrium distribution was reached.
This can be evaluated looking at the MCMC trace (see Evaluation step). Default is |
nb_posterior_samples |
Numerical. Number of posterior samples to extract, after removing the burn-in, in the final |
additional_BAMM_settings |
List of named elements. Additional settings options for BAMM provided as a list of named arguments.
Ex: |
BAMM_output_directory_path |
Character string. The path to the directory used to store input/output files generated. Use '/' to separate directory and subdirectories. It must end with '/'. |
keep_BAMM_outputs |
Logical. Whether the |
MAP_odd_ratio_threshold |
Numerical. Controls the definition of 'core-shifts' used to distinguish across configurations when fetching the MAP samples.
Shifts that have an odd-ratio of marginal posterior probability / prior lower than |
skip_evaluations |
Logical. Whether to skip the Evaluation step including MCMC trace, ESS, and prior/posterior comparisons for expected number of shifts. Default = |
plot_evaluations |
Logical. Whether to display the plots generated during the Evaluation step: MCMC trace, and prior/posterior comparisons for expected number of shifts. Default = |
save_evaluations |
Logical. Whether to save the outputs of evaluations in a table (ESS), and PDFs (MCMC trace, and prior/posterior comparisons for expected number of shifts)
in the |
This function runs a full BAMM (Bayesian Analysis of Macroevolutionary Mixtures) workflow
to produce a BAMM_object that contains a phylogenetic tree and associated diversification rates
mapped along branches, across selected posterior samples.
Step 1: Set BAMM
Produces a tree file for the phylogeny. Default file: 'phylogeny.tree'.
Save configuration settings used for the BAMM run. Default file: 'config_file.txt'.
Save default priors generated by BAMMtools::setBAMMpriors based on the phylogeny. Default file: 'priors.txt'.
Step 2: Run BAMM
Run BAMM using the system console
Move output files in dedicated BAMM_output_directory. Default directory is ./BAMM_outputs/.
'run_info.txt' containing a summary of your parameters/settings.
'mcmc_log.txt' containing raw MCMC information useful in diagnosing convergence.
'event_data.txt' containing all evolutionary rate parameters and their topological mappings.
'chain_swap.txt' containing data about each chain swap proposal (when a proposal occurred, which chains might be swapped, and whether the swap was accepted).
'acceptance_info.txt' containing the history of acceptance/proposal of MCMC steps (If additional setting outputAcceptanceInfo is set to 1).
Step 3: Evaluate BAMM
Plot the MCMC trace = evolution of logLik across MCMC generations. Output file = 'MCMC_trace_logLik.pdf'.
Compute the Effective Sample Size (ESS) across posterior samples (after removing burn-in) using coda::effectiveSize().
This is a way to evaluate if your MCMC runs has enough generations to produce robust estimates. Ideally, ESS should be higher than 200.
Output file = 'ESS_df.csv'.
Plot the comparison of prior and posterior distributions of the number of regime shifts with BAMMtools::plotPrior.
Output file = 'PP_nb_shifts_plot.pdf'.
A good value for expectedNumberOfShifts is one with high similarities between the distributions
hinting that the information in the data coincides with your expectations for the number of regime shifts.
The best practice consists in trying several values to control if it affects or not the final output.
Step 4: Import BAMM outputs
Load BAMM outputs with BAMMtools::getEventData.
Subset posterior samples to the requested nb_posterior_samples with BAMMtools::subsetEventData.
Record the $expectedNumberOfShifts used to set the prior. This is useful for downstream analyses involving comparison of prior vs. posterior probabilities
(See BAMMtools::distinctShiftConfigurations()).
Record the marginal posterior probability of regime shift along branches based on the proportion of samples harboring a regime shift along each branch.
(See BAMMtools::marginalShiftProbsTree()). Result is stored in $MSP_tree as phylogenetic tree with $edge.length scaled to the marginal posterior probability.
Extract the Maximum A Posteriori probability (MAP) configuration = the configuration of regime shift location found the most frequently among the posterior samples.
(See BAMMtools::getBestShiftConfiguration()). This ignores shifts that have an odd-ratio of marginal posterior probability / prior lower than MAP_odd_ratio_threshold
to avoid noise from non-core shifts. MAP sample indices are stored in $MAP_indices. Diversification rates and shift locations on branches are then averaged across all MAP samples and
recorded as an object of class "bammdata" in $MAP_BAMM_object with a single $eventData table used to plot regime shifts on the phylogeny with plot_BAMM_rates().
Extract the Maximum Shift Credibility (MSC) configuration = the configuration of regime shift location with the highest product of marginal probabilities across branches.
(See BAMMtools::maximumShiftCredibility()). MSC sample indices are stored in $MSC_indices. Diversification rates and shift locations on branches are then averaged across all MSC samples and
recorded as an object of class "bammdata" in $MSC_BAMM_object with a single $eventData table used to plot regime shifts on the phylogeny with plot_BAMM_rates().
Step 5: Clean BAMM files
Remove files generated in Steps 1 & 2 if keep_BAMM_outputs = FALSE.
Delete the BAMM_output_directory if empty after cleaning files.
The BAMM_object output:
is typically used as input to run deepSTRAPP with run_deepSTRAPP_for_focal_time() or run_deepSTRAPP_over_time().
can be used to extract rates and regimes for any focal_time in the past with update_rates_and_regimes_for_focal_time().
can be used to map diversification rates and regime shifts on the phylogeny with plot_BAMM_rates().
The function returns a BAMM_object of class "bammdata" which is a list with at least 22 elements.
Phylogeny-related elements used to plot a phylogeny with ape::plot.phylo():
$edge Matrix of integers. Defines the tree topology by providing rootward and tipward node ID of each edge.
$Nnode Integer. Number of internal nodes.
$tip.label Vector of character strings. Labels of all tips.
$edge.length Vector of numerical. Length of edges/branches.
$node.label Vector of character strings. Labels of all internal nodes. (Present only if present in the initial BAMM_object)
BAMM internal elements used for tree exploration:
$begin Vector of numerical. Absolute time since root of edge/branch start (rootward).
$end Vector of numerical. Absolute time since root of edge/branch end (tipward).
$downseq Vector of integers. Order of node visits when using a pre-order tree traversal.
$lastvisit ID of the last node visited when starting from the node in the corresponding position in $downseq.
BAMM elements summarizing diversification data:
$numberEvents Vector of integer. Number of events/macroevolutionary regimes (k+1) recorded in each posterior configuration. k = number of shifts.
$eventData List of data.frames. One per posterior sample. Records shift events and macroevolutionary regimes parameters. 1st line = Background root regime.
$eventVectors List of integer vectors. One per posterior sample. Record regime ID per branches.
$tipStates List of named integer vectors. One per posterior sample. Record regime ID per tips.
$tipLambda List of named numerical vectors. One per posterior sample. Record speciation rates per tips.
$tipMu List of named numerical vectors. One per posterior sample. Record extinction rates per tips.
$eventBranchSegs List of matrix of numerical. One per posterior sample. Record regime ID per segments of branches.
$meanTipLambda Vector of named numerical. Mean tip speciation rates across all posterior configurations of tips.
$meanTipMu Vector of named numerical. Mean tip extinction rates across all posterior configurations of tips.
$type Character string. Set the type of data modeled with BAMM. Should be "diversification".
Additional elements providing key information for downstream analyses:
$expectedNumberOfShifts Integer. The expected number of regime shifts used to set the prior in BAMM.
$MSP_tree Object of class phylo. List of 4 elements duplicating information from the Phylogeny-related elements above,
except $MSP_tree$edge.length is recording the Marginal Shift Probability of each branch (i.e., the probability of a regime shift to occur along each branch)
$MAP_indices Vector of integers. The indices of the Maximum A Posteriori probability (MAP) configurations among the posterior samples.
$MAP_BAMM_object. List of 18 elements of class '"bammdata" recording the mean rates and regime shift locations found across
the Maximum A Posteriori probability (MAP) configurations. All BAMM elements summarizing diversification data holds a single entry describing
this mean diversification history.
$MSC_indices Vector of integers. The indices of the Maximum Shift Credibility (MSC) configurations among the posterior samples.
$MSC_BAMM_object List of 18 elements of class '"bammdata" recording the mean rates and regime shift locations found across
the Maximum Shift Credibility (MSC) configurations. All BAMM elements summarizing diversification data holds a single entry describing
this mean diversification history.
The function also produces files listed in the Details section and stored in the the BAMM_output_directory.
This function relies on BAMM to provide a reliable solution to map diversification rates and regime shifts on a time-calibrated phylogeny
and obtain the BAMM_object object needed to run the deepSTRAPP workflow (run_deepSTRAPP_for_focal_time, run_deepSTRAPP_over_time).
However, it is one option among others for modeling diversification on phylogenies.
You may wish to explore alternatives models such as LSBDS model in RevBayes (Höhna et al., 2016), the MTBD model (Barido-Sottani et al., 2020),
or the ClaDS2 model (Maliet et al., 2019) for your own data.
However, you will need Bayesian models that infer regime shifts to be able to perform STRAPP tests (Rabosky & Huang, 2016).
Additionally, you need to format the model output such as in BAMM_object, so it can be used in a deepSTRAPP workflow.
This function perform a single BAMM run to infer diversification rates and regime shifts. Due to the stochastic nature of the exploration of the parameter space with MCMC process, best practice recommend to ran multiple runs and check for convergence of the MCMC traces, ensuring that the region of high probability has been reached by your MCMC runs.
Maël Doré
For BAMM: Rabosky, D. L. (2014). Automatic detection of key innovations, rate shifts, and diversity-dependence on phylogenetic trees. PloS one, 9(2), e89543. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1371/journal.pone.0089543")}. Website: http://bamm-project.org/.
For {BAMMtools}: Rabosky, D. L., Grundler, M., Anderson, C., Title, P., Shi, J. J., Brown, J. W., ... & Larson, J. G. (2014).
BAMM tools: an R package for the analysis of evolutionary dynamics on phylogenetic trees. Methods in Ecology and Evolution, 5(7), 701-707.
\Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/2041-210X.12199")}
run_deepSTRAPP_for_focal_time() run_deepSTRAPP_over_time() update_rates_and_regimes_for_focal_time() prepare_trait_data() plot_BAMM_rates()
For a guided tutorial, see this vignette: vignette("model_diversification_dynamics", package = "deepSTRAPP")
# ----- Example 1: Whale phylogeny ----- #
library(phytools)
data(whale.tree)
## Not run:
## You need to install the BAMM C++ software locally prior to run this function
# Visit the official BAMM website (\url{http://bamm-project.org/}) for information.
# Run BAMM workflow with deepSTRAPP
whale_BAMM_object <- prepare_diversification_data(
BAMM_install_directory_path = "./software/bamm-2.5.0/",
phylo = whale.tree,
prefix_for_files = "whale",
numberOfGenerations = 100000, # Set low for the example
BAMM_output_directory_path = tempdir(), # Can be adjusted such as "./BAMM_outputs/"
keep_BAMM_outputs = FALSE, # Adjust if needed
)
## End(Not run)
# Load directly the result
data(whale_BAMM_object)
# Explore output
str(whale_BAMM_object, 1)
# Plot mean net diversification rates and regime shifts on the phylogeny
plot_BAMM_rates(whale_BAMM_object, cex = 0.5,
labels = TRUE, legend = TRUE)
# ----- Example 2: Ponerinae phylogeny ----- #
# Load phylogeny
data("Ponerinae_tree", package = "deepSTRAPP")
plot(Ponerinae_tree, show.tip.label = FALSE)
## Not run:
## You need to install the BAMM C++ software locally prior to run this function
# Visit the official BAMM website (http://bamm-project.org/) for information.
# Run BAMM workflow with deepSTRAPP
Ponerinae_BAMM_object <- prepare_diversification_data(
BAMM_install_directory_path = "./software/bamm-2.5.0/",
phylo = Ponerinae_tree,
prefix_for_files = "Ponerinae",
numberOfGenerations = 10^7, # Set high for optimal run, but will take ages
BAMM_output_directory_path = tempdir(), # Can be adjusted such as "./BAMM_outputs/"
keep_BAMM_outputs = FALSE, # Adjust if needed
)
## End(Not run)
if (deepSTRAPP::is_dev_version())
{
# Load directly the result
data(Ponerinae_BAMM_object)
## This dataset is only available in development versions installed from GitHub.
# It is not available in CRAN versions.
# Use remotes::install_github(repo = "MaelDore/deepSTRAPP") to get the latest development version.
# Explore output
str(Ponerinae_BAMM_object, 1)
# Plot mean net diversification rates and regime shifts on the phylogeny
plot_BAMM_rates(Ponerinae_BAMM_object,
labels = FALSE, legend = TRUE)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.