output.evorates: Process the output of an Evolving Rates model

View source: R/FIT_main.R

output.evoratesR Documentation

Process the output of an Evolving Rates model

Description

This function processes raw Stan-based Hamiltonian Monte Carlo (HMC) samples and associated information and returns all this information in a (relatively) user-friendly format.

Usage

output.evorates(
  run.evorates.obj,
  stanfit = NULL,
  call = NULL,
  trans.const = NULL,
  dat = NULL,
  include.warmup = FALSE,
  report.quantiles = c(0.025, 0.5, 0.975),
  report.means = TRUE,
  report.devs = TRUE,
  remove.trend = TRUE
)

Arguments

run.evorates.obj

Either the output of run.evorates() or a string specifying files to load. These files always consist of some base name followed by a variable suffix: either "_<X>.csv", where <X> is a chain number, or "_info". These files contain Stan-based HMC samples/information and auxiliary information related to data input/transformation constants, respectively. In general, the string should consist of just the base name, not the variable suffix, though the function does attempt to remove suffixes if detected.

stanfit, call, trans.const, dat

Generally, you should not mess with these unless you know what you're doing! These are the basic components the function uses to form its output. stanfit is the Stan-based HMC samples/information, while the rest make up the auxiliary information regarding input and transformation constants. You can use these arguments to overwrite certain parts of run.evorates.obj for debugging purposes.

include.warmup

TRUE or FALSE: should warmup be included in posterior samples? Warmup is always included for parameters used to tune the HMC chain, but warmup may be included or excluded for actual estimated parameters. Defaults to FALSE.

report.quantiles

A vector of posterior distribution quantiles to return (should be between 0 and 1). Set to NULL to not return any quantiles. Defaults to 2.5%, 50%, and 97.5% quantiles.

report.means

TRUE or FALSE: should posterior distribution means be returned? Defaults to TRUE.

report.devs

TRUE or FALSE: should the difference between branchwise rates and the overall "background rate" on the natural log scale ("rate deviations") be returned? The background rate is simply the mean branchwise rate, weighted by each branch's respective length. If TRUE, also calculates the posterior probability rate deviations are greater than 0. These additional parameters help give a sense of which branches in tree exhibit anomalous trait evolution rates. Defaults to TRUE, but is automatically switched to FALSE when fitting a Brownian Motion model (constrain.Rsig2 = FALSE & trend = FALSE).

remove.trend

TRUE or FALSE: should the rate deviation calculations remove (i.e., "account for") the effect of trends in rates over time? This may be helpful since strong trends in rates can mask otherwise anomalously high or low trait evolution rates in certain parts of the tree. Defaults to TRUE, but has no effect if no trend was fitted or if report.devs is FALSE. If TRUE, note that report.devs will be switched to FALSE if fitting early/late burst model (constrain.Rsig2 = FALSE & trend = TRUE).

Details

Parameter definitions:

  • Rate variance (R_sig2, also denoted \sigma^2_{\sigma^2}): Determines how much random variation accumulates in rates over time. Specifically, independent lineages evolving for a length of time t would exhibit log-normally distributed rates with standard deviation sqrt(t * R_sig2). Another way to think about this is that the 95% credible interval for fold-changes in rates over 1 unit of time is given by exp(c(-1,1) * 1.96 * sqrt(R_sig2)), assuming R_mu = 0.

  • Trend (R_mu, also denoted \mu_{\sigma^2}): Determines whether median rates tend to decrease (if negative) or increase (if positive) over time. Specifically, the median fold-change in rate for a given lineage is exp(t * R_mu) over a length of time t. This is distinct from changes in average rates, which depends on both R_mu and R_sig2.

  • Note on combining rate variance and trend parameters: The 95% credible interval for fold-changes in rates over 1 unit of time, given a trend, is simply exp(R_mu) * exp(c(-1,1) * 1.96 * sqrt(R_sig2)) or equivalently exp(R_mu + c(-1,1) * 1.96 * sqrt(R_sig2)). This distribution of rate change is right-skewed due to the exp() function. Because of this, median changes in rates over time will always be lower than average changes. The "average change" parameter, denoted R_del or \delta_{\sigma^2}, is given by R_mu + R_sig2 / 2. Accordingly, the average fold-change in rate for a given lineage is exp(t * R_del) over a length of time t.

  • Tip error variance (Y_sig2, also denoted \sigma^2_y): Determines the among of "error" around observed trait values for tips without fixed standard errors. Specifically, raw observations for a given tip are sampled from a normal distribution centered at that tip's "true trait value" with standard deviation sqrt(Y_sig2).

  • Rate at the root (R_0, also denoted \sigma^2_0): The natural log of the rate at the root of the entire phylogeny.

  • Branchwise rates (R_i, also denoted ln \bar{\sigma^2_i}, where i is the index of an edge in tree): The natural log of the average rate along branches of the phylogeny. Note that rates are always shifting over time under this model, and these quantities are thus averages. The true rate value at any particular time point along a branch is a related, but separate, quantity. These will be NA for branches of length 0.

  • Background rate (bg_rate): The natural log of the average trait evolution rate for the entire phylogeny, given by the average of exp(R_i), with each entry weighted by its respective branch length. NAs are ignored in this calculation since they correspond to branches of length 0.

  • Branchwise rate deviations (Rdev_i, also denoted ln \bar{\sigma^2_{dev,i}}): Determines whether a branches exhibit relatively "fast" (if positive) or "slow" rates (if negative). Generally, these are the differences between the branchwise rates and geometric background rate on the natural log scale, though this depends on remove.trend). Here, the geometric background rate is defined as the weighted average of R_i, with weights corresponding to branch lengths. This helps prevent some issues with comparing highly right-skewed distributions. If remove.trend = TRUE, then branchwise rates are first "detrended" prior to calculating background rates and deviations (R_i - (-log(abs(R_mu)) - log(l_i) + log(abs(exp(R_mu * t1_i) - exp(R_mu * t0_i)))), where l is a vector of branch lengths and t0/t1 are vectors of start and end times of each branch). This basically make branchwise rate deviations determine whether branches exhibit slow/fast rates given the overall trend in rates through time. Otherwise, branchwise rate deviations will simmply indicate slow/fast branches occur at the root/tips of a tree in the presence of a strong trend.

Value

An object of class "evorates_fit" if return.as.obj = TRUE. Otherwise, the directory results were saved to (see out.file for details). An evorates_fit object is a list of at least 5 components:

  • call, which contains information on the final tree, trait values, trait standard errors, and prior parameters passed to Stan's HMC sampling algorithm (on untransformed scale for better interpretability, see sampling.scale).

  • sampler.control, which contains various information on the HMC run, including the number of chains, iterations, warmup, thinning rate, etc.

  • sampler.params, an array of class "param_block", containing parameters/diagnostics that were used to tune the behavior of the HMC while Stan ran it, as well as the (log) prior (prior__), likelihood (lik__), and posterior probability (post__) of each iteration in the HMC. See Stan manual for more information on what the parameters mean. The likelihood is not raised to lik.power here, but the log posterior probability is calculated while accounting for lik.power. This always includes warmup iterations, though these can be discarded using exclude.warmup() or combine.chains().

  • param.diags, a param_block array, containing diagnostics for each parameter estimated during the fit, including the initial value of the HMC chain (init), the bulk effective sample size (bulk_ess), the tail effective sample size (tail_ess), and the Rhat (Rhat). See ?rstan::Rhat for more details on what these diagnostics mean. Generally, you want effective sample sizes to be greater than 400ish and Rhat to be less than 1.01ish. The functions check.ess() and check.mix() will check these thresholds for you automatically.

  • chains, another param_block array containing sampled parameter values for each parameter estimated during the fit. See details for further information on what each parameter means.

  • The object optionally contains more param_block arrays of posterior distribution quantiles (quantiles) and means (means), and posterior probabilities (post.prob, see report.devs and remove.trend).

All param_block arrays' dimensions go in the order of iterations/diagnostics/quantiles, then parameters, then chains.

See Also

fit.evorates is a convenient wrapper for input.evorates, run.evorates, and output.evorates

Examples

#get whale/dolphin tree/trait data
data("cet_fit")
tree <- cet_fit$call$tree
trait.data <- cet_fit$call$trait.data

#prepare data for evorates model
input <- input.evorates(tree = tree, trait.data = trait.data)
#run the evorates model sampler (takes a couple minutes)
run <- run.evorates(input.evorates.obj = input, out.file = "test_fit", chains = 1)
#process the output from the object
fit <- output.evorates(run.evorates.obj = run)
#process the output from the files
fit <- output.evorates(run.evorates.obj = "test_fit")

#see fit.evorates() documentation for further examples



bstaggmartin/evorates documentation built on May 31, 2024, 5:56 a.m.