fit.evorates | R Documentation |
This function processes tree and trait data, runs a Stan-based Hamiltonian Monte Carlo (HMC) sampler to fit these data to an evorates model, and returns the output of this sampler in a (relatively) user-friendly format.
fit.evorates(
tree,
trait.data,
trait.se = NULL,
constrain.Rsig2 = FALSE,
trend = FALSE,
lik.power = 1,
sampling.scale = FALSE,
return.as.obj = TRUE,
out.file = NULL,
check.overwrite = TRUE,
include.warmup = FALSE,
report.quantiles = c(0.025, 0.5, 0.975),
report.means = TRUE,
report.devs = TRUE,
remove.trend = TRUE,
...
)
tree |
An object of class " |
trait.data |
Three options:
Both multiple observations for a single tip and missing observations are allowed.
In all cases, the associated names must generally match the tip labels found in |
trait.se |
A vector, matrix, or data.frame of trait value standard errors which must be unambiguously
labeled (see
Any conflicting/impossible standard error specifications are corrected and return warnings. |
constrain.Rsig2 |
|
trend |
|
lik.power |
A single number between 0 and 1 specifying what power to raise the likelihood function to.
This is useful for sampling from "power posteriors" that shift the posterior to look more like the prior (indeed,
you can set this to 0 to sample from the prior itself). Useful for model diagnostics and calculating things
like Bayes Factors. Technically, you can set |
sampling.scale |
|
return.as.obj |
|
out.file |
A directory to save results to. If unspecified, an automatic directory is
generated based on the current date and time and R's working directory. The function will generate csv files
for each chain of the HMC sampler using Stan's built-in functionality (note that these will thus be on the
transformed scale), as well as a separate RDS file giving additional information about the data. Defaults to
|
check.overwrite |
|
include.warmup |
|
report.quantiles |
A vector of posterior distribution quantiles to return (should be between 0 and 1). Set to
|
report.means |
|
report.devs |
|
remove.trend |
|
... |
Other optional arguments:
|
Parameter definitions:
Rate variance (R_sig2
, also denoted \sigma^2_{\sigma^2}
): Determines how much random variation
accumulates in rates over time. Specifically, independent lineages evolving for a length of time t
would exhibit log-normally
distributed rates with standard deviation sqrt(t * R_sig2)
. Another way to think about this is that the 95% credible interval
for fold-changes in rates over 1 unit of time is given by exp(c(-1,1) * 1.96 * sqrt(R_sig2))
, assuming R_mu = 0.
Trend (R_mu
, also denoted \mu_{\sigma^2}
): Determines whether median rates tend to decrease (if
negative) or increase (if positive) over time. Specifically, the median fold-change in rate for a given lineage is
exp(t * R_mu)
over a length of time t
. This is distinct from changes in average rates, which depends on both
R_mu
and R_sig2
.
Note on combining rate variance and trend parameters: The 95% credible interval
for fold-changes in rates over 1 unit of time, given a trend, is simply
exp(R_mu) * exp(c(-1,1) * 1.96 * sqrt(R_sig2))
or equivalently exp(R_mu + c(-1,1) * 1.96 * sqrt(R_sig2))
.
This distribution of rate change is right-skewed due to the exp()
function. Because of this, median changes in rates
over time will always be lower than average changes. The "average change" parameter, denoted R_del
or
\delta_{\sigma^2}
, is given by R_mu + R_sig2 / 2
. Accordingly, the average fold-change in rate
for a given lineage is exp(t * R_del)
over a length of time t
.
Tip error variance (Y_sig2
, also denoted \sigma^2_y
): Determines the among of "error" around observed
trait values for tips without fixed standard errors. Specifically, raw observations for a given tip are sampled from a normal
distribution centered at that tip's "true trait value" with standard deviation sqrt(Y_sig2)
.
Rate at the root (R_0
, also denoted \sigma^2_0
): The natural log of the rate at the root of the entire phylogeny.
Branchwise rates (R_i
, also denoted ln \bar{\sigma^2_i}
, where i is the index of an edge in tree
):
The natural log of the average rate along branches of the phylogeny. Note that rates are always shifting over time under this model,
and these quantities are thus averages. The true rate value at any particular time point along a branch is a related, but separate,
quantity. These will be NA for branches of length 0.
Background rate (bg_rate
): The natural log of the average trait evolution rate for the entire phylogeny, given by the average of
exp(R_i)
, with each entry weighted by its respective branch length. NAs are ignored in this calculation since they correspond
to branches of length 0.
Branchwise rate deviations (Rdev_i
, also denoted ln \bar{\sigma^2_{dev,i}}
): Determines whether a
branches exhibit relatively "fast" (if positive) or "slow" rates (if negative). Generally, these are the differences
between the branchwise rates and geometric background rate on the natural log scale,
though this depends on remove.trend)
. Here, the geometric background rate is defined as the weighted average of R_i
,
with weights corresponding to branch lengths. This helps prevent some issues with comparing highly right-skewed distributions.
If remove.trend = TRUE
, then branchwise rates are first "detrended" prior to calculating background rates and deviations
(R_i - (-log(abs(R_mu)) - log(l_i) + log(abs(exp(R_mu * t1_i) - exp(R_mu * t0_i))))
, where l
is a vector of
branch lengths and t0
/t1
are vectors of start and end times of each branch). This basically make branchwise rate deviations
determine whether branches exhibit slow/fast rates given the overall trend in rates through time. Otherwise, branchwise rate
deviations will simmply indicate slow/fast branches occur at the root/tips of a tree in the presence of a strong trend.
An object of class "evorates_fit
" if return.as.obj = TRUE
. Otherwise, the directory
results were saved to (see out.file
for details). An evorates_fit
object is a list of at least
5 components:
call
, which contains information on the final tree, trait values, trait standard errors, and prior
parameters passed to Stan's HMC sampling algorithm (on untransformed scale for better interpretability,
see sampling.scale
).
sampler.control
, which contains various information on the HMC run, including the number of chains,
iterations, warmup, thinning rate, etc.
sampler.params
, an array of class "param_block
", containing parameters/diagnostics that were used to tune
the behavior of the HMC while Stan ran it, as well as the (log) prior (prior__
), likelihood (lik__
),
and posterior probability (post__
) of each iteration in the HMC. See Stan manual for more information on
what the parameters mean. The likelihood is not raised to lik.power
here, but the log posterior probability
is calculated while accounting for lik.power
. This always includes warmup iterations, though these can be
discarded using exclude.warmup()
or combine.chains()
.
param.diags
, a param_block
array, containing diagnostics for each parameter
estimated during the fit, including the
initial value of the HMC chain (init
), the bulk effective sample size (bulk_ess
), the tail
effective sample size (tail_ess
), and the Rhat (Rhat
). See Rhat for more details
on what these diagnostics mean. Generally, you want effective sample sizes to be greater than 100 times the number of
chains and Rhat to be less than 1.01. The functions check.ess()
and check.mix()
will check these thresholds for you
automatically.
chains
, another param_block
array containing sampled parameter values for each parameter estimated during the fit. See
details for further information on what each parameter means.
The object optionally contains more param_block
arrays of posterior distribution quantiles (quantiles
)
and means (means
), and posterior probabilities (post.prob
, see report.devs
and remove.trend
).
All param_block
arrays' dimensions go in the order of iterations/diagnostics/quantiles, then parameters, then chains.
fit.evorates is a convenient wrapper for input.evorates, run.evorates, and output.evorates
#get whale/dolphin tree/trait data
data("cet_fit")
tree <- cet_fit$call$tree
trait.data <- cet_fit$call$trait.data
#fit data to evorates model (takes a couple minutes)
fit <- fit.evorates(tree = tree, trait.data = trait.data, chains = 1)
#specifying parameter constraints
#add trend parameter
trend.fit <- fit.evorates(tree = tree, trait.data = trait.data, chains = 1,
trend = TRUE)
#only trend parameter (early burst model)
EB.fit <- fit.evorates(tree = tree, trait.data = trait.data, chains = 1,
trend = TRUE, constrain.Rsig2 = TRUE)
#no Rsig2 or trend (Brownian Motion model)
BM.fit <- fit.evorates(tree = tree, trait.data = trait.data, chains = 1,
constrain.Rsig2 = TRUE)
#specifying trait standard error
#by default, estimates standard error, but you can set all tips to have no standard error
noSE.fit <- fit.evorates(tree = tree, trait.data = trait.data, chains = 1,
trait.se = 0)
#or even set it for specific tips
SE.fit <- fit.evorates(tree = tree, trait.data = trait.data, chains = 1
trait.se = setNames(c(1, 2), c("Orcinus_orca", "Balaenoptera_musculus")))
#specifying priors
#make Rsig2 prior tighter (normally it's 5)
Risg2.fit <- fit.evorates(tree = tree, trait.data = trait.data, chains = 1,
R_sig2_prior_sd = 1, sampling.scale = TRUE)
#maybe you REALLY believe rates should decrease by ~1% every million years?
Rmu.fit <- fit.evorates(tree = tree, trait.data = trait.data, chains = 1,
trend = TRUE,
R_mu_prior_mean = log(1 - 0.01), R_mu_prior_sd = log(1 + 0.005))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.