RWTY produces a number of different plots and statistics for evaluating the performance of MCMC chains. These plots can help to illustrate problems with convergence in idividual chains as well as explore differences between chains built from different sequence data. Here we're going to explore some of these plots in detail, comparing well-behaved to poorly-behaved chains when appropriate.

Running RWTY

We'll start by running RWTY on the two included demo data sets. These are called "salamander" (Williams et al. 2013) and "fungus" (Hibbett et al. 1997). The salamander data contains two chains each from three different sequences (for a total of six chains), while the fungus data contains four chains from a single set of sequence data.

library(rwty)
data(salamanders)
data(fungus)
rwty.processors <- 1

For starters we won't eliminate any burnin, just load in the data and go. You could build all of the plots below just by running analyze.rwty, like this:

salamanders.rwty <- analyze.rwty(salamanders)
fungus.rwty <- analyze.rwty(fungus)

But instead we're going to build them one by one and talk about them as we go.




Parameter plots

Let's start off by plotting the parameters in our salamander chains' parameter tables. This will give us some idea of where to cut our burnin off. First we'll figure out our column names, then we'll make a plot of the column representing likelihood.

colnames(fungus$Fungus.Run1$ptable)
makeplot.param(fungus, burnin = 0, "LnL")

Notice that we get two plots here. The trace plot shows us the likelihood of the trees being explored in each generation of the chain. The density plots show the same information in histogram form. The density plots look a little weird because we've still got burnin included - they're trying to plot histograms with a couple of extreme outliers. Let's fix that by eliminating 50 trees from the start of the chain and trying again.

makeplot.param(fungus, burnin = 50, "LnL")

Those histograms look a lot more reasonable, but the top likelihood trace plot has a pretty massive jump about seven million generations in. That indicates that that chain probably jumped from one local optimum to another. Notice also that the density plots have a lot of information in them. First off, the coloring represents data values that fall inside and outside the central 95% confidence interval. Notice that the top plot (Run 1) is bimodal. That's another hint that your chain has jumped between optima somewhere along the line. Even more interesting is the fact that runs 1 and 2 seem to be exploring trees with likelihoods that are quite a bit different from chains 3 and 4 - about 100 LnL difference, in fact. Given that those likelihoods are presented as logs, this is quite a substantial difference. Even though we've run four separate chains for ten million generations, we are nowhere near getting a reliable answer from this data.

There's another indicator here that you should make sure to look at: the ESS figures in the top bar of each plot. These represent the effective samples size for the plotted parameter in a given chain. You can see here that the ESS values for runs 1 and 4 are extremely low; normally you aim for an ESS of over 200 as a rule of thumb.

Let's compare the salamander plots for makeplot.param. We'll just eliminate 50 trees as burnin here as well, based on previous experience mucking around with this data.

makeplot.param(salamanders, burnin = 50, "LnL")

Okay let's pause here to remind ourselves of a key difference between the salamander and fungus data: the fungus data is four separate runs built using the same sequence data. The salamander data, on the other hand, represents three separate bits of sequence data, with two chains for each. As such we should expect all of the fungus chains to look pretty much the same, but the salamander chains should consist of three pairs of very similar chains, assuming all of those chains have reached stationarity. That is in fact what we see here, but the chains are sufficiently different that we're compressing a lot of variation down just so we can visualize everything at once. Let's concentrate on just those first two chains, which should be very similar because they are based on the same sequence data. We can do this just by picking the first and second elements out of the salamanders object, which is itself just a regular R list.

salamanders.amotl <- list(salamanders[[1]], salamanders[[2]])
makeplot.param(salamanders.amotl, burnin = 50, "LnL")

Very nice. We can now see the variation in the trace plot, and the histograms look quite comparable. At this zoomed-in scale we're not necessarily looking for the trace plot to be perfectly flat; we mostly just want to see that there is no long term directional trend.

We can build similar plots with any of the other model parameters in the log table. Here we'll explore the parameter pi.A.

makeplot.param(fungus, burnin = 50, "pi.A.")

It's worth looking at the difference between the parameter plot here and the likelihood plot above for the same data. Note that the pi.A plots don't show anywhere near as much of a long term trend as the likelihood plots, and that the ESS for pi.A is higher than for likelihood. Also note how similar the histograms for pi.A are between chains, compared to the level of similarity seen between the likelihood plots. This argues strongly for not relying too heavily on model parameters for estimating chain convergence; the model of molecular evolution is quite likely to reach stationarity long before our topological estimates. However, even those likelihood plots can be far too generous when it comes to diagnosing convergence, as we'll see below.




Topology trace plots and histograms

We can also make plots representing tree topology that are very similar to our parameter plots. These plots depict the distance of the chain from a focal tree as a function of generation in the chain. Although these plots are necessarily discarding some variation by dint of compressing an N dimensional tree space to a single axis, they can highlight severe problems with some chains that might not be visible from likelihood plots.

makeplot.topology(salamanders.amotl, burnin = 50)

What we're looking for here is any long-term directional trend in tree distance, or disagreement between the histograms. A long-term trend or sudden jump in tree distance would indicate a jump to a new area of treespace, while a disagreement between histograms for two chains would indicate that they were exploring different regions of treespace. For example, with the fugus data we see both.

makeplot.topology(fungus, burnin = 50)

The top chain shows a substantial jump in tree distances at about seven million generations, and the histograms for all four chains look substantially different from each other.




Split frequency plots

One of the issues with using likelihood plots to diagnose convergence is that they're not necessarily very good at telling you when your chains are exploring multiple optima that have similar likelihoods. While it's clear that the first chain is jumping between optima from the plots above, the other three look relatively homogenous. However, that is entirely misleading. Look at what happens when you explore the behavior of posterior probability estimates as a function of chain length.

makeplot.splitfreqs.cumulative(fungus, burnin = 50)

Just to explain what you're looking at here: along the X axis, you have the generations in your chain. The Y axis has the posterior probability estimate for a given clade. These plots are cumulative, meaning that at each point in the chain we are calculating the posterior probability using all generations prior to that point. You can interpret this as saying "if I stopped my chain at generation X, my posterior probability estimate for the clade represented by a given line would be Y".

These plots by default show the top 20 most variable clades in the chain in terms of changing posterior probability estimate along the chain. In other words, these are the clades that tend to change in posterior probability most as we continue to explore tree space. Ideally what you'd like to see is every line in these plots flattening out fairly early in the chain and remaining flat until the end. Failure to do so indicates clades that will, if you run your analysis longer, change susbtantially in their support values.

Here we see that all four of these chains contain clades that have clear trends in posterior probability estimates even towards the very end of the chain. Although run 3 is probably the smoothest, there are still clear directional trends all the way out to the very end of the chain.

Let's contrast that with the salamander data.

makeplot.splitfreqs.cumulative(salamanders, burnin = 50)

This certainly looks a lot better than the fungus data. Maybe a bit of wobble, but it doesn't seem like running these for longer is going to change those posterior probability estimates by more than a few points.

You can also make sliding window split frequency plots.

makeplot.splitfreqs.sliding(salamanders, burnin = 50)

These plots look similar to the above, but their meaning is quite different. These plots are looking at the posterior probability estimates made over short intervals along the chain. A chain that is exploring the space of possible topologies well is not necessarily expected to produce a flat sliding window plot, particularly if you have multiple local topological optima. You would still not like to see a long term trend in these plots, but the overall level of variation is not as much of a concern. Looking at these plots, I still might want to run those chains for another 10 or 20 million generations, because there are a few clades that seem to show some net directionality in their posterior probability estimates. However, contrast the above plots with the plots from the fungus data:

makeplot.splitfreqs.sliding(fungus, burnin = 50)

Here again we see that the chains are behaving quite badly. Although run 3 wouldn't necessarily be too much cause for concern, each of the others show substantial jumps late in the chain that are not reversed at any point. While it's possible that this indicates that we got stuck on a local optimum temporarily but have found the global optimum towards the end of the chain, it would be unwise to assume so. This is particularly the case when you compare these plots to the likelihood traces above for the fungus data. Let's just focus on the run 4, for example. The sliding window and cumulative split frequency plots indicate that the chain has jumped to a new area of tree space at about 7 million generations, and yet the likeliood trace for run 4 does not show a substantial jump at that point. This suggests that the chain has moved from one local optimum to a second local optimum of approximately equal likelihood. In order to feel sure that we had explored those optima sufficiently to have reached stationarity in our posterior probability estimates we'd really want to see that chain jump back and forth between those optima many, many times. The fact that we only get one jump that is not reversed, with no associated change in likelihoods, indicates that the consensus tree and posterior probability estimates we would calculate from this chain are completely unreliable.




Pairs plots

One of the new features of RWTY is that it allows us to look at correlations between tree topology and continuous model parameters via a "pairs plot". These plots treat tree topology approximately like a continuous parameter, by calculating the distance of each tree from a focal tree.

makeplot.pairs(salamanders[[1]], burnin = 50, params = c("LnL", "pi.A.", "pi.C."))

There's a lot to unpack in these plots, so here's a quick rundown. Below the diagonal, we've got plots that show the movement of our chain in various combinations of tree and parameter space. For each plot, darker colors indicate generations earlier in the chain, and lighter colors indicate generations that are later. We can get a lot of information out of these plots. First off, they can tell us whether our overall estimate of the parameters were changing along the length of the chain (i.e., if we see darker points clustered together and lighter points clustered together). They can also tell us whether our estimates for a pair of parameters are correlated, which would be indicated by some overall slope to the pairwise plot.

On the diagonal, we have histograms indicataing the 95% CI for each parameter, and above the diagonal we have density plots for the relationship between two parameters. These are simply a replotting of the data below the diagonal, in a form that may be easier to process if you have a lot of generations in your chain.

Now let's check out the fungus data.

makeplot.pairs(fungus[[1]], burnin = 50, params = c("LnL", "pi.A.", "pi.C."))

Here we start to see just how informative these plots can be. Look at the bottom left corner, and you'll see a plot of likelihood vs. topological distance from a focal tree. There's a lot of information just in this plot about what our chain was doing. We can see that early on our chain was exploring one set of trees (darker points are separated from lighter points), but at some point it jumped to a different optimum that was farther from the focal tree (higher on the Y axis) and had a higher likelihood score (farther to the right on the X axis). Looking at the rest of the plots below the diagonal in the bottom row and in the left column shows that this jump to new topologies was not associated with any huge changes in parameter values for the model of molecular evolution; although the early and late points are separate along the topological distance and likelihood axes, the marginal distributions along the pi.A and pi.C axes are fairly similar. There might be a slight shift higher in pi.A with the shift to the new optimum and a slight shift lower in pi.C, but it's not massive.




Average Change in Split Frequency (ACSF) plots

We can also look at the behavior of chains by looking at the rate of change in split frequency estimates. This is sort of a condensation of what's going on above, except instead of plotting the raw value of split frequencies, we're comparing each split frequency to the frequency of that split in the previous window (for sliding window plots) or across the chain up to that point (for cumulative plots). For example:

makeplot.acsf.cumulative(salamanders, burnin = 50)

This is what well-behaved chains look like in a cumulative split frequency plot. If we're largely exploring the same regions(s) of treespace, our estimates for the frequency of each split should be settling down to an approximation of their final value. That means that the acsf should be going down fairly reliably, although there's often a bit of noise. The plot here contains several colored bands. The central tendency (mean) of the change across all splits in the chain is given as a dotted line. The increasingly lighter ribbons show the 75%, 95%, and 100% confidence intervals, respectively. This doesn't imply that the chains necessarily agree with each other (in fact they don't, as we'll see in the treespace plots), but it means that each chain is at least stabilizing on its own answer. Let's compare that to the fungus data.

makeplot.acsf.cumulative(fungus, burnin = 50)

Here again we see substantial changes happening quite late in the chain for several of the chains, most notably chains 1 and 4.

Now let's loook at a sliding window plot for the salamander data.

makeplot.acsf.sliding(salamanders, burnin = 50)

With these plots we're mostly looking for relative uniformity, particularly in the central tendency and the narrower confidence intervals. That's not to say that a bit of variation is necessarily problematic, but again what we're most concerned about is big jumps late in the chain, as we see with the fungus data below.

makeplot.acsf.sliding(fungus, burnin = 50)

Again chains 1 and 4 clearly leap to a new area of topology space near the end of their respective runs; we see a sudden increase in the difference between one window and the next. This is essentially the same information we're getting from the cumulative plots above, but sometimes it's easier to see these late-chain jumps in the sliding window plots becacuse the sudden change in split frequency isn't being diluted by averaging over the entire length of the chain.




Treespace plots

We can also visualize chains' exploration of treespace using the treespace plotting functions in RWTY.

makeplot.treespace(salamanders, burnin =50, fill.color = "LnL")

Remember that we've got three separate bits of sequence data here, and two chains for each. All of these heatmaps and dot plots are based on the same NMDS-scaled tree space, meaning that they are directly comparable. There's something really interesting that we can start to see in this plot: although each pair of chains for the same sequence is exploring similar topologies (i.e., the top left panel compared to the top center panel, the top right panel compared to the bottom left panel, etc.), the chains for different data sources are exploring very different areas of tree space. This is why our dot coloring in the dot plot isn't very informative; dot colors represent likelihoods, and the likelihoods of the trees based on different data sources aren't commensurate. Let's just home in on those two AMOTL chains then.

makeplot.treespace(salamanders.amotl, burnin =50, fill.color = "LnL")

Okay, now we can see a lot more detail on the similarities between these two chains, and the coloring of the dot plots is more informative. From these plots it certainly looks like the chains are exploring the same area of tree space, albeit not in exactly the same frequency (i.e., chain 1 seems to be more heavily balanced towards the left side of our tree space and chain 2 the right side). This might argue for running these chains longer.

Notice that, although we've colored our circles in our dot plot by likelihood, it's not clear that there's any particular relationship between the likelihood (dot color) and the position in topology space. There are a couple of reasons this might be the case; first, these NMDS plots necessarily include a huge reduction in the dimensionality of the space being explored, and may put things close together in 2D space that are in fact fairly distinct along some axis that doesn't weight heavily on those first two NMDS axes. Second, we've eliminated burnin here and are exploring one relatively contiguous optimum. That means that we will largely be exploring trees and parameter values that produce relatively similar likelihood estimates, so maybe we shouldn't expect a whole lot of structure in whatever residual variation there is left.

Okay, let's check out our fungus data.

makeplot.treespace(fungus, burnin =50, fill.color = "LnL")

What's going on with these heatmaps?

One thing to keep in mind here is that the density scaling is continuous across all plots. Apparently run 3 here has spent almost all of its time on a single tree, so it's got one pixel of extremely high frequency. That squishes the scale of the other tree plots down to a point where you can't really see what they're doing at all. The dot plots are much more informative in this case, and they clearly show that all four of these chains are doing entirely different things. Given that they're based on the same sequence data, this suggests that none of them should be trusted even if the individual chain plots (LnL trace, cumulative split frequency, etc.) look fine.

Since RWTY plots are ggplot2 objects, they have all the data required for plotting them stored within the plot itself. This is super handy if, say, you want to plot all of these chains on a single plot. Check it out:

my.treespace <- makeplot.treespace(fungus, burnin =50, fill.color = "LnL")
qplot(x, y, data = my.treespace$treespace.points.plot$data, color = chain) +theme_bw()




Autocorrelation plots

RWTY also produces plots that allow you to examine the autocorrelation between trees in your chain. Leaving heated chains aside for the moment, MCMC chains choose their next step in tree or model space based on where the chain is currently. That means there is necessarily some level of autocorrelation in the chain - subsequent trees are not necessarily random pulls from the stationary distribution of topologies. For this reason, it is typical to rarify the MCMC chain to reduce autocorrelation between trees. In a chain that has reached stationarity and that has been subsampled at appropriate intervals, subsequent trees would be no more correlated with each other than would trees randomly drawn from the chain. In other words, we would expect that correlation between trees would be uncorrelated with the distance between them. Autocorrelation plots allow us to examine these relationships.

makeplot.autocorr(salamanders, burnin = 0)

These chains all look quite nice. Tree distance between pairs of trees is largely unaffected by the distance between them along the chain, indicating that the chains have potentially reached stationarity and that the sampling interval between trees is probably satisfactory. If chains had reached stationarity but the sampling interval were too small, we would see lower values for the mean path distance between trees at smaller sampling intervals, which would then increase toward a fixed value as sampling interval became large enough to overcome autocorrelation.

makeplot.autocorr(fungus, burnin = 0)

In chains that are far from stationarity, these plots will often show a general trend towards increasing path distance with increasing sampling interval that does not plateau, indicating that there is some overal trend in the chain towards directional movement in topology space. The absence of such a trend does not necessarily imply that convergence has been reached, but the presence of such a trend should always be cause for concern.




Pseudo-ESS plots

Another useful way of estimating the effects of autocorrelation on your MCMC chains is by calculating effective sample size. The math for doing this for continuous parameters is well-established, but it's not possible to apply that approach directly to tree topologies. To deal with this, Lanfear et al. (2016) developed an approach based on looking at the ESS of distance from a focal tree. The pseudo-ESS plots produced by RWTY simply give confidence intervals for these estimates.

makeplot.pseudo.ess(salamanders, burnin = 50)

As a general rule of thumb, it's a good idea to have an ESS or pseudo-ESS of at least 200. As such we can be pretty happy about our salamander data, but not so much when it comes to the fungus data:

makeplot.pseudo.ess(fungus, burnin = 50)




Split frequency matrix plots

Split frequency matrix plots are useful for comparing the support values for clades across different chains. We get two separate plots out of these analyses.

makeplot.splitfreq.matrix(salamanders, burnin = 50)

The first plot is a matrix of pairwise comparisons of split frequencies across chains. Below the diagonal, we have scatter plots with support values for each clade from one chain given on the X axis, with support values from the other chain given on the Y axis. Pearson correlation coefficients are given above the diagonal, as are the average standard deviation of split frequencies across that pair of chains.

Here we see something interesting that was suggested by some of the treespace plots above: the chains for the same sequence data are very tightly correlated (e.g., AMOTL2.run1 and AMOTL2.run2, or LHX2.run1 vs. LHX2.run2). Almost all of the points for those comparisons fall very close to the 1:1 line, and the R values are very close to 1.00, indicating that the posterior probability estimates for all clades are very nearly the same for those pairs of chains. However, we can see that there is substantial disagreement between chains built using different sequence data; AMOTL2.run1 and LHX2.run1 each show high support values for some clades that the other chain shows very low support for, and vice versa. This is easily seen by the presence of off-diagonal points, particularly those in the lower right and upper left corner of their pairwise plot (left column, third row from the top). Points in the lower right and upper left represent clades that have near-perfect support in one chain but near-zero support in the other.

The dendrogram produced here represents similarity between chains as calcualted by their ASDSF value. Chains that produce similar support values will be closer together in the tree, while those that produce very different support values will be farther apart. These plots will be most useful in multi-locus data sets such as this, where they can help illustrate groups of sequences that are producing very similar topological estimates and support values.

makeplot.splitfreq.matrix(fungus, burnin = 50)

When we plot the fungus data this way, we can see that in fact no pair of chains produces a satisfactory (near 1) r value, indicating that these chains are far from agreeing on topology or support values. Given that these runs are all based on the same data, this is definitely an issue with convergence rather than an accurate reflection of (for instance) different gene trees as one might argue for the patterns seen in the salamander data above.




ASDSF plots

Finally, we can visualize agreement between chains by visualizing the average standard deviation of split frequencies across chains as a function of chain length. Chains with different starting conditions are expected to initially explore different regions of tree space, resulting in differences in the frequency of a given split across chains. This is reflected in a high ASDSF early in the chain. A set of well-behaved chains that is exploring the same region of treespace will be characterized by a decreasing ASDSF as the chains proceed.

makeplot.asdsf(salamanders.amotl, burnin = 50)

Here we see exactly what we'd like to see: decreasing ASDSF as the chains go on. This means the chains are converging in their estimate of topology and support values. The various colored areas represent the 75%, 95%, and 100% CIs, with the central dotted line representing the mean. When we look across all of the salamander chains, we get slightly different results:

makeplot.asdsf(salamanders, burnin = 50)

Here the overall trend is still downward, which is good. However you'll notice that the 95% and 100% CIs are staying quite high even at the end of the chain. This reflects the fact that there are some clades (as seen above) for which the stationary support values are high based on one data set, but low based on others. Thus, even if each chain has reached stationarity, there will still be a set of clades over which they continue to disagree. These clades will as a result always have a high sdsf value, and as such the wider CIs of this plot will not narrow past a certain point.

makeplot.asdsf(fungus, burnin = 50)

The fungus data again shows a large amount of disagreement between chains, which seems to have only declined slightly over the course of ten million generations. This again is evidence of a lack of convergence in these chains - being built on the same data, what we'd like to see here is something similar to what we saw for the salamanders.amotl data set above.




Citations

Hibbett DS, Pine EM, Langer E, Langer G, Donoghue MJ. 1997. Evolution of gilled mushrooms and puffballs inferred from ribosomal DNA sequences. Proceedings of the national academy of sciences 94:12002-12006.

Williams JS, Niedzwiecki JH, Weisrock DW. 2013. Species tree reconstruction of a poorly resolved clade of salamanders (Ambystomatidae) using multiple nuclear loci. Molecular phylogenetics and evolution 68:671-682.



danlwarren/RWTY documentation built on Sept. 5, 2021, 8:35 p.m.