obtainDatedPosteriorTreesMrB: Get the Sample of Posterior Trees from a Dated Phylogenetic...

View source: R/obtainDatedPosteriorTreesMrB.R

obtainDatedPosteriorTreesMrBR Documentation

Get the Sample of Posterior Trees from a Dated Phylogenetic Analysis with MrBayes (Or a Summary Tree, such as the MCCT)

Description

MrBayes is not great for getting samples of dated posterior phylogenies, or for obtaining certain summary trees from the posterior (specifically the MCCT and MAP, which are specific trees in the posterior). This is because the tree samples as returned are scaled relative to rate parameters in a separate file. This function attempts to automate the handling of multiple files (both .t tree files and .p parameter files), as well as multiple files associated with separate runs, to obtain samples of posterior trees, or summary trees such as the MCCT or MAP. These resulting trees are now scaled to units of time, but not be placed correctly on an absolute time-scale if all tips are extinct. See details of output below.

Usage

obtainDatedPosteriorTreesMrB(
  runFile,
  nRuns = 2,
  burnin = 0.5,
  outputTrees,
  labelPostProb = FALSE,
  getFixedTimes = FALSE,
  getRootAges = FALSE,
  originalNexusFile = NULL,
  file = NULL
)

Arguments

runFile

A filename in the current directory, or a path to a file that is either a .p or .t file from a MrBayes analysis. This filename and path will be used for finding additional .t and .p files, via the nRuns settings and assuming that files are in the same directory and these files are named under typical MrBayes file naming conventions. (In other words, if you have renamed your .p or .t files, this function probably won't be able to find them.)

nRuns

The number of runs in your analysis. This variable is used for figuring out what filenames will be searched for: if you specify that you have less runs than you actually ran in reality, then some runs won't be examined in this function. Conversely, specify too many, and this function will throw an error when it cannot find files it expects but do not exist. The default for this argument (two runs) is based on the default number of runs in MrBayes.

burnin

The fraction of trees sampled in the posterior discarded and not returned by this function directly, nor included in calculation of summary trees. Must be a numeric value greater than 0 and less than 1.

outputTrees

Determines the output trees produced; for format of output, see section on returned Value below. Must be of length one, and either "all", which means all trees from the post-burn-in posterior will returned, a number greater than zero, which will be the number of trees randomly sampled from across the post-burn-in posterior and returned, or a label for a type of summary tree selected from the posterior based on various properties. The two most commonly seen such point-estimate-summaries are the MCCT tree, which stands for the 'maximum clade compatibility tree', and the MAP tree, which stands for the 'maximum a posteriori tree'. The MCCT is the single tree from the post-burn-in posterior which contains the set of bifurcations (clades) with the highest product of posterior probabilities (i.e. are found on the most trees in the post-burn-in posterior). The MCCT tree is returned if the argument outputTrees = "MCCT" is used. The MAP is the single tree from the post-burn-in posterior with the highest posterior probabilty associated with it. Unfortunately, versions of paleotree prior to version 3.2.1 did not use the posterior probability to select the supposed 'MAP' tree. MrBayes provides two values for each sampled tree and corresponding parameters: LnPr, the log prior probability of the current parameter proposals under the specified prior distributions, and LnL, the log likelihood of the cold chain, i.e. the log-likelihood of the sampled parameter values, given the observed data and specified models. Neither of these are the posterior probability. The true posterior probability (as given by Bayes Theorem) is the product of the likelihood and the prior probability, divided by the likelihood of the model, the latter of which is very rarely known. More commonly, the calculable portion of the posterior probability is the product of the likelihood and the prior probability; or, here, easily calculated as the log posterior probability, as the sum of the log likelihood and log prior probability. Given confusion over application of 'MAP' trees in previous version of paleotree, three options are available: "MAPosteriori" for the Maximum A Posteriori tree (the MAP tree, or the single tree in the posterior with the highest posterior probability, as given by LnPr + LnL), "MAPriori" for the Maximum A Priori tree (the tree in the posterior sample with the highest prior probability, indep of the data), and "MaxLikelihood", the tree with the highest model likelihood given the data, ignoring the prior probability. The former option of outputTrees = "MAP" is deprecated, as its previous implementation only examine LnPr and thus returned the tree now referred to here as the "MAPriori" tree. Interestingly, this bug had no effect when tip-dating methods is applied to datasets with no character matrix is provided (an empty matrix of '?' missing values is used) in order to find dated phylogenies that maximize the fit to the dated tree prior, as the log likelihood for a tree with an empty matrix is always zero, and thus the posterior probability is always exactly identical to the prior probability. Overall, the Maximum A Posteriori tree is the "best" tree based on the metric most directly considered by Bayesian analysis for proposal acceptance, but the MCCT may be the best tree for summarizing topological support. In either case, point estimates of topology are often problematic summaries for phylogenetic analyses.

labelPostProb

Logical. If TRUE, then nodes of the output tree will be labeled with their respective posterior probabilities, as calculated based on the frequency of a clade occurring across the post-burn-in posterior tree sample. If FALSE, this is skipped.

getFixedTimes

If TRUE, this function will also look for, scan, and parse an associated NEXUS file. Ignoring any commented lines (i.e., anything between a set of rectangular brackets [] ), commands for fixing taxa will be identified, parsed and returned to the user, either as a message printed to the R console if output is written to a file, or as a attribute named 'fixed ages' if output as an R object (formatted as a two-column table of OTU names and their respective fixed ages).

Please note: the code for getFixedTimes = TRUE contains a while() loop in it for removing nested series of square brackets (i.e. treated as comments in NEXUS files). Thus files with ridiculously nested series of brackets may cause this code to take a while to complete, or may even cause it to hang.

getRootAges

FALSE by default. If TRUE, and getFixedTimes = TRUE as well as file = NULL (such that trees will be assigned within the R memory rather than saved to an external file), the functions setRootAge and its wrapper function setRootAges will be applied to the output so that all output trees have root.time elements for use with other functions in paleotree as well as other packages.

originalNexusFile

Filename (and possibly path too) to the original NEXUS file for this analysis. Only tried if getFixedTimes = TRUE. If NULL (the default), then this function will instead try to find a NEXUS file with the same name as implied by the filename used in other inputs. If this file cannot be found, the function will fail.

file

Filename (possibly with path) as a character string leading to a file which will be overwritten with the output trees (or summary tree), as a NEXUS file. If NULL (the default), the output will instead be directly returned by this function.

Details

This function is most useful for dealing with dating analyses in MrBayes, particularly when tip-dating a tree with fossil taxa, as the half-compatibility and all-compatibility summary trees offered by the 'sumt' command in MrBayes can have issues properly portraying summary trees from such datasets.

Value

Depending on argument file, the output tree or trees is either returned directly, or instead written out in NEXUS format via ape's write.NEXUS function to an external file. The output will consist either of multiple trees sampled from the post-burn-in posterior, or will consist of a single phylogeny (a summary tree, either the MCCT or the MAP - see the details for the argument outputTrees).

If the argument setRootAges = TRUE is not used, users are warned that the resulting dated trees will not have $root.time elements necessary for comparison against an absolute time-scale. Wile the trees may be scaled to units of absolute time now, rather than with branch lengths expressed in the rate of character change, the dates estimated by some phylogenetics functions in R may give inaccurate estimates of when events occur on the absolute time-scale if all tips are extinct. This is because most functions for phylogenetics in R (and elsewhere) will instead presume that the latest tip will be at time 0 (the modern), which may be wrong if you are using paleotree for analyzing paleontological datasets consisting of entirely extinct taxa. This can be solved by using argument getFixedTimes = TRUE to obtain fixed tip ages, and then scaling the resulting output to absolute time using the argument setRootAges = TRUE, which obtains a $root.time element for each tree using the functions setRootAge and setRootAges (for single and multiple phylogenies).

Author(s)

David Bapst, with rescaling of raw output trees via code originally written by Nicholas Crouch.

See Also

When the arguments getFixedTimes = TRUE and setRootAges = TRUE are used, the resulting output will be scaled to absolute time with the available fixed ages using functions setRootAge and setRootAges (for single and multiple phylogenies). This is only done if fixed ages are available and if the tree is not being saved to an external file.

Maximum Clade Credibility trees are estimated using the function maxCladeCred in package phangorn.

See function link{tipDatingCompatabilitySummaryMrB} for additional ways of solely evaluating the topoligical information in trees taken from MrBayes posterior samples.

Examples

## Not run: 

MCCT <- obtainDatedPosteriorTreesMrB(
 	runFile = "C:\\myTipDatingAnalysis\\MrB_run_fossil_05-10-17.nex.run1.t",
 	nRuns = 2, 
 	burnin = 0.5,
 	outputTrees = "MCCT", 
 	file = NULL)

MAP <- obtainDatedPosteriorTreesMrB(
 	runFile = "C:\\myTipDatingAnalysis\\MrB_run_fossil_05-10-17.nex.run1.t",
 	nRuns = 2, 
 	burnin = 0.5, 
 	getFixedTimes = TRUE,
 	outputTrees = "MAPosteriori", 
 	file = NULL)

# get a root age from the fixed ages for tips
setRootAge(tree = MAP)

#pull a hundred trees randomly from the posterior
hundredRandomlySelectedTrees <- obtainDatedPosteriorTreesMrB(
 	runFile = "C:\\myTipDatingAnalysis\\MrB_run_fossil_05-10-17.nex.run1.t",
 	nRuns = 2, 
 	burnin = 0.5, 
 	getFixedTimes = TRUE,
 	getRootAges = TRUE,
 	outputTrees = 100, 
 	file = NULL)



## End(Not run)


paleotree documentation built on Aug. 22, 2022, 9:09 a.m.