title: "BTprocessR" output: github_document: toc: true toc_depth: 2
NOTE This package is still under construction. Some functions may be absent, or non-functional in their current state.
BTprocessR is an R package that provides a set of tools to help with the analysis of the output of the various MCMC models in BayesTraits. The package includes functions for visualising the posterior distribtuion of the estimated parameters, summarising and plotting posterior distributions of phylogenies (resulting from rate-variable and/or reversible jump local transformation (RJLT)) models, summarising inferred rate scalars for each node and branch in a tree, and identifying and plotting rate shifts. Currently the package only deals with the output of analyses of continuous traits, with functions for the various discrete trait analyses coming soon.
The package includes the output of three analyses of marsupuial body size - a brownian motion analysis, a variable rates analysis and an RJLT analysis using the delta transformation. In way of a disclaimer please note that in order to keep the package size small each of these MCMC chains collected only 500 samples, which may be insufficient to ensure proper convergence. The tree and data that these analyses used are also included should one wish to repeat these analyses with a longer chain or higher sampling rate, access with data(marsupial_data)
.
A log file containing an MCMC posterior (by default BayesTraits appends these files with .Log.txt) can be parsed with the function loadPosterior. This function returns a tibble with each sample taken during the MCMC chain, one per row. Printing this object returns some simple summary statistics for each parameter.
library(BTprocessR)
post <- loadPosterior(system.file("extdata", "marsupials_brownian.txt.Log.txt", package = "BTprocessR"))
print(post)
## Posterior of 500 samples
##
## Parameter Median Mean Mode SD
## 1 Lh -140.112 -139.772 -139.377 1.018
## 2 Alpha.1 2.13 2.127 2.071 0.388
## 3 Sigma.2.1 0.011 0.011 0.011 0.001
##
## # A tibble: 500 x 5
## Iteration Lh Tree.No Alpha.1 Sigma.2.1
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1020000 -139. 1 1.84 0.0111
## 2 1040000 -139. 1 2.38 0.0112
## 3 1060000 -140. 1 2.18 0.00989
## 4 1080000 -139. 1 2.19 0.0110
## 5 1100000 -139. 1 2.10 0.0113
## 6 1120000 -140. 1 2.60 0.0104
## 7 1140000 -141. 1 1.60 0.0122
## 8 1160000 -140. 1 2.60 0.0110
## 9 1180000 -139. 1 1.89 0.0103
## 10 1200000 -140. 1 2.56 0.00985
## # … with 490 more rows
# This information can also be assigned to an object for one or more parameters.
post_info <- getPostStats(post, parameter = "Lh")
post_info
## # A tibble: 1 x 5
## parameter mean median mode sd
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Lh -140. -140. -139. 1.02
post_info <- getPostStats(post, parameter = c("Lh", "Sigma.2.1"))
post_info
## # A tibble: 2 x 5
## parameter mean median mode sd
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Lh -140. -140. -139. 1.02
## 2 Sigma.2.1 0.0111 0.011 0.011 0.001
It is also possible to plot histograms of each of the parameters present in the posterior, and to plot some simple plots to aid in the visual diagnosis of convergence for either a specific parameter, or for specific parameter(s). The resulting plots are a histogram of the parameter samples, a trace plot, an autocorrelation plot, and a plot of the sliding mean of the samples with a window size of ten and a regression line (standard linear model).
plot(post)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
mcmcPlots(post)
mcmcPlots(post, parameters = "Lh")
mcmcPlots(post, parameters = c("Lh", "Sigma.2.1"))
BTprocessR contains functions that allow the quick comparison of like parameters over more than one posterior for the visual comparison of models. The legend on the resulting plot numbers the logfiles in the order that they were given to the function.
post_brownian <- loadPosterior(system.file("extdata", "marsupials_brownian.txt.Log.txt", package = "BTprocessR"))
post_vrates <- loadPosterior(system.file("extdata", "marsupials_vrates.txt.Log.txt", package = "BTprocessR"))
compPosts(logs = list(post_brownian, post_vrates), parameter = "Lh")
# Can take any number of posteriors.
post_vdelta <- loadPosterior(system.file("extdata", "marsupials_variabledelta.txt.Log.txt", package = "BTprocessR"))
compPosts(logs = list(post_brownian, post_vrates, post_vdelta), parameter = "Lh")
BayesTraits can calculate marginal likelhoods for a model by use of the stepping stone sampler. This is the recommended way to calculate marginal likelihoods for RJ models, and returns a file appended with .Stones.txt. BTprocessR can retrieve the marginal likelihoods from these files and return a table of marginal likelihoods as estimated by the stepping stone sampler. Plotting the resulting object will create a bayes factor plot (row vs column). It is worth providing labels for each model, especially if plotting or using a long filepath to the stones files otherwise the plot and/or table will be unwieldy.
stones <- c(system.file("extdata", "marsupials_brownian.txt.Stones.txt", package = "BTprocessR"),
system.file("extdata", "marsupials_vrates.txt.Stones.txt", package = "BTprocessR"),
system.file("extdata", "marsupials_variabledelta.txt.Stones.txt", package = "BTprocessR"))
marginal_likelihoods <- getStones(stones, labels = c("Brownian motion", "Var. rates", "RJ delta"))
marginal_likelihoods
## # A tibble: 3 x 2
## logfile marginalLh
## <chr> <dbl>
## 1 Brownian motion -155.
## 2 Var. rates -120.
## 3 RJ delta -120.
plot(marginal_likelihoods)
One of the outputs of the various reversible jump (RJ) models in BayesTraits is a collection of transformed trees sampled during the MCMC process. By default BayesTraits appendeds these files with ".Output.trees". BTprocessR can read these posteriors in, summarise them, and plot the original tree alongisde the mean, median and modal trees from the posterior to visualise the difference.
The original tree (generally the time tree) is provided to enable a comparison between the posterior of trees and the original branch lengths.
The function returns a list of two elements - the first is a multiPhylo object containing the original tree, and trees transformed with the mean, median and mode branch lengths and the second element is a tibble containing the per-branch information.
data(marsupial_trees)
post_trees <- summariseTrees(reftree = marsupial_trees$timetree,
trees = marsupial_trees$vrates_trees)
## [1] "Ladderizing posterior trees:"
names(post_trees)
## [1] "tree_summaries" "branchlength_info"
print(post_trees)
## $tree_summaries
## 4 phylogenetic trees
## tree 1 : original_tree 246 tips
## tree 2 : mean_tree 246 tips
## tree 3 : median_tree 246 tips
## tree 4 : mode_tree 246 tips
##
## $branchlength_info
## # A tibble: 490 x 6
## original_bl mean_bl median_bl mode_bl range_bl sd_bl
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 8.9 11.3 8.9 8.95 196. 13.5
## 2 7.8 25.4 7.8 7.91 1159. 63.6
## 3 2.1 6.82 2.1 2.27 232. 18.6
## 4 0.1 0.338 0.1 0.116 13.4 0.984
## 5 9.5 106. 46.8 17.0 1241. 152.
## 6 8.3 19.7 8.3 8.23 295. 32.7
## 7 10.9 24.0 10.9 10.7 651. 42.3
## 8 10.1 25.4 10.1 9.74 819. 56.0
## 9 0 0 0 0 0 0
## 10 0 0 0 0 0 0
## # … with 480 more rows
##
## attr(,"class")
## [1] "rj_trees" "list"
plot(post_trees$tree_summaries)
# Compare specific tree to time tree.
plot(post_trees$tree_summaries, tree = "median_tree")
As well as the posterior sample of trees BayesTraits also returns a lot of information about the location and magnitude of the scalars and transformations applied to the trees during the MCMC process. This information can be summarised and manipulated to identify shifts that appear frequently in the posterior, as well as to identify regions of the tree that are frequently scaled and transformed by other rootward processes. The first step is to read and process this raw information, which will typically be appended the suffix ".VarRates.txt". This process also requires the posterior sample of transformed trees, and the original time tree to use as a reference. The example below is for a variable rates analysis. Note that the output of rjpp will include the tree summary as produced by summariseTrees, so it may not be necessary to use the function in isolation (as above).
# the rjpp function takes the filename/filepath to the .VarRates.txt file.
# for variable rates...
rj_info <- system.file("extdata", "marsupials_vrates.txt.VarRates.txt", package = "BTprocessR")
vrates_summary <- rjpp(rjlog = rj_info, rjtrees = marsupial_trees$vrates_trees, tree = marsupial_trees$timetree)
## [1] "Loading log file."
## [1] "Loading posterior trees."
## [1] "Calculating mean branch lengths."
## [1] "Ladderizing posterior trees:"
## [1] "Finding taxa."
## [1] "Calculating MRCAs."
## [1] "Searching for scalars..."
# or for variable delta...
rj_info <- system.file("extdata", "marsupials_variabledelta.txt.VarRates.txt", package = "BTprocessR")
vdelta_summary <- rjpp(rjlog = rj_info, rjtrees = marsupial_trees$vdelta_trees, tree = marsupial_trees$timetree)
## [1] "Loading log file."
## [1] "Loading posterior trees."
## [1] "Calculating mean branch lengths."
## [1] "Ladderizing posterior trees:"
## [1] "Finding taxa."
## [1] "Calculating MRCAs."
## [1] "Searching for scalars..."
# rjpp returns a list with six elements, and a lot of information.
class(vrates_summary)
## [1] "rjpp" "list"
vrates_summary
##
## BayesTraits reversible-jump output testing the following transformations:
##
## variable rates
##
## A posterior consisting of 500 samples,
## for a tree of 246 species and 490 edges.
names(vrates_summary)
## [1] "scalars" "tree_summary" "species_key" "rates"
## [5] "origins" "niter"
The rjpp function returns a list of six elements. These elements are detailed here...
Most of the important information is returned in the first element of the output - a tibble called "scalars". The columns of this tibble break down into three main categories. These are general branch/node information, total scalar information and then (depending on which transformations were permitted in the model) linear rate scalars, delta transformations, kappa transformations and lambda transformations. Below is more detail on what each column contains.
mid: The midpoint of the branch (time before present).
Total scalar information
nOrgnScalar: The number of scalars that originate at this particular branch. This differs from nScalar due to linear rate scalars that operate on entire nodes rather than individual branches i.e. a rate scalar operating on a node will rate scale all descendent branches. In this case all descendent branches will receive +1 in nScalar, but only the node the scalar originates from will receive +1 in nOrgnScalar.
Linear rate scalar information i.e. when varrates is turned on.
sdRate: The standard deviation of rate scalars effecting this branch (including the tipward effects of node scalars).
Transformations Depending on the specific model this section will contain one or more of the following transformations: delta, kappa and/or lambda. Since the output for all transformations is identical for the purposes of this explanation the name of the transformation will be denoted with Trans. Due to the nature of transformations, especially nested transformations, there is no information on the tipward effects of transformations. Note that transformations are only ever applied to nodes, and not to individual branches.
This is the output of the tree_summary function (used above). It has two elements - the first containing four phylogenetic trees (the original tree, the mean tree, the median tree and the modal tree, all summarised from the posterior) and the second is a tibble detailing, for each branch, the original branch lengths, mean, median and modal branch lengths, the range of branch lengths across the posterior and the standard deviation of branch lengths in the posterior.
This is a list of n elements, where n is the number of branches (edges) in the phylogeny. Each element describes a branch, and contains the ancestral and descendent node numbers (according to the standard node numbering of ape) and the species names of all species that descend from that node. The names of these elements correspond the the node_id column of the scalars table.
This table contains the linear rate scalar that is applied to each branch in each sample in the posterior (the effects of both node scalars and branch scalars). This is a n * (m + 2) tibble, where n is the number of branches in the tree + 1 (to account for the root) and m is the number of samples in the posterior (+2 to account for the two identification columns). The first two columns are the node_id (which corresponds to the scalars table and the species_key) and the branch number (corresponding to ape node numbering). Each subsequent column is one sample of the posterior. This table includes the tipward effects of node scalars.
This is of tables named according to the transformations specified in the original model. If varrates was allowed it will contain two tables - nodes and branches - and then a table for any other transformation permitted in the model. As for the rates object, the first two columns in each are node_id and branch, and each column is a sample from the posterior. These are the specific values of the scalars applied to each branch (in the case of varrates) and node (in the case of varrates and other transforamtions). The difference between the branches and nodes tables here and the previous rates table is that these are simply the scalars that originate on a node or branch, and do not include the tipward effects.
The number of samples in the posterior (used by other functions - not that useful...)
# Some of this should be clearer with an example where two transformations are permitted.
rj_info <- system.file("extdata", "marsupials_variabledeltarate.txt.VarRates.txt", package = "BTprocessR")
vdeltarate_summary <- rjpp(rjlog = rj_info, rjtrees = marsupial_trees$vdeltarate, tree = marsupial_trees$timetree)
## [1] "Loading log file."
## [1] "Loading posterior trees."
## [1] "Calculating mean branch lengths."
## [1] "Ladderizing posterior trees:"
## [1] "Finding taxa."
## [1] "Calculating MRCAs."
## [1] "Searching for scalars..."
vdeltarate_summary
##
## BayesTraits reversible-jump output testing the following transformations:
##
## variable rates
## delta
##
## A posterior consisting of 500 samples,
## for a tree of 246 species and 490 edges.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.