get.bg.rate: Extract background rate parameters from a fitted evorates...
In bstaggmartin/evorates: Evolving Rate Models of Continuous Trait Evolution

get.bg.rate

R Documentation

Extract background rate parameters from a fitted evorates model

Description

This function extracts background rate parameters from an evorates_fit object. It even works when the object has no branchwise rate parameters (i.e., a Brownian Motion model).

Usage

get.bg.rate(
  fit,
  node.groups = list(),
  edge.groups = list(),
  type = c("chains", "quantiles", "means", "diagnostics"),
  extra.select = NULL,
  simplify = TRUE,
  remove.trend = FALSE,
  geometric = FALSE,
  log = TRUE,
  partial.match = TRUE,
  keep.R = FALSE
)

Arguments

`fit`	An object of class "`evorates_fit`".
`node.groups`	A list of numeric or character vectors specifying clades for which to calculate background rates. Basically finds monophyletic clades including all nodes in a given vector by passing the vector to `get.clade.edges()`. Notably, using negative numeric vectors to exclude nodes is not allowed. If any list elements have names, these names will be "passed on" to name the resulting background rate parameters (see Value for details).
`edge.groups`	A list of numeric vectors specifying edge indices in `fit$call$tree` for which to calculate background rates. Can be negative to exclude edges as well. If a vector is `NULL` (the default if no node/edge groups are specified), all edge indices are used to calculate the global background rate. If any list elements have names, these names will be "passed on" to name the resulting background rate parameters (see Value for details).
`type`	A string specifying whether to extract posterior samples ("`chains`"), quantiles ("`quantiles`"), means ("`means`"), or diagnostics ("`diagnostics`"). Can be any unambiguous abbreviation of these strings as well. Defaults to extracting posterior samples.
`extra.select`	A numeric, integer, or character vector specifying the specific samples/ quantiles/diagnostics to extract, depending on `type`. Defaults to `NULL`, which generally extracts all available samples/quantiles/diagnostics. See documentation on param_block operators for details (grapes-chains-grapes, grapes-quantiles-grapes, grapes-means-grapes, or grapes-diagnostics-grapes).
`simplify`	`TRUE` or `FALSE`: should the resulting `param_block` array(s) be simplified? If `TRUE` (the default), dimensions of length 1 in the result are automatically collapsed, with corresponding information stored as attributes (this is the default behavior of param_block operators).
`remove.trend`	`TRUE` or `FALSE`: should rates be detrended prior to calculating background rates? Detrending makes it more difficult to interpret background rates and is usually unhelpful for summarizing rates of trait evolution for some focal clade (thus the default is `FALSE`). However, strong trends in rates over time will inevitably cause clades of different ages to exhibit different average rates, and detrending can thus help facilitate "fair" comparisons among different clades. This all is notably inconsequential when no trend parameter is estimated in the first place.
`geometric`	`TRUE` or `FALSE`: should rates be geometrically averaged on the natural log scale? Just as with `remove.trend`, geometric averaging makes it more difficult to interpret background rates and is usually unhelpful for summarizing rates of trait evolution for some focal clade (thus the default is `FALSE`). However, abnormally high branchwise rates tend to greatly influence conventional (arithmetic) averages. Because of this, comparing individual branchwise rates to geometric, rather than arithmetic, averages, tends to yield "fairer" comparisons.
`log`	`TRUE` or `FALSE`: should rates be returned on the natural log scale? Defaults to `TRUE` for consistency, since branchwise rates are generally on the natural log scale.
`partial.match`	`TRUE` or `FALSE`: are partial matches between any strings in `node.group` and labels in `fit$call$tree` allowed? Defaults to `TRUE`.
`keep.R`	`TRUE` or `FALSE`: should the branchwise rates used to calculate background rates also be returned? This is helpful for comparing individual branchwise rates to background rates. For example, detrended branchwise rate deviations may calculated using this function by setting `remove.trend`, `geometric`, `log`, and `keep.R` to `TRUE` and subtracting the resulting background rate from the individual branchwise rates. Defaults to `FALSE`.

Details

The tip and node indices of fit$call$tree can be viewed by running: plot(fit$call$tree); tiplabels(); nodelabels(). The edge indices of fit$call$tree can be similarly viewed by running: plot(fit$call$tree); edgelabels().

In the case of a fit with constrained rate variance (R_sig2) and no trend (R_mu) parameter, the background rate for any clade will simply be an appropriately named/formatted (see Value for details) copy of the root rate (R_0). Even if there is a trend parameter but remove.trend is TRUE, this will still result detrended background rates equivalent to the root rate (R_0).

Value

Typically an array of class "param_block" with a param_type set to whatever type is. The dimension of these arrays will generally go in the order of iterations/quantiles/diagnostics, then parameters, then chains. Any dimensions of length 1 are collapsed and stored as attributes if simplify is TRUE. Here, the parameters will be background rates for edge groups defined by node.groups, then background rates for edge groups defined by edge.groups.

The parameter names will differ according to function arguments: in general, each node/edge group is assigned a name (either taken from original list input or an abbreviated code indicating which edges are included in the group). The resulting parameter names will then generally be of the form "bg_rate(<group_name>)", denoting the weighted average of branchwise rates for a particular group, with weights given by proportions of branch lengths for each edge within the group. Additionally, names may contain "exp()" or "log()" per user specifications of geometric and log. Lastly, if remove.trend is TRUE, names may refer to "uncent_Rdev_i" (i.e., uncentered rate deviations) rather than "R_i", where i denotes edge indices.

If keep.R is TRUE, the output will instead be a list containing param_block arrays. If only one node/edge group is specified, this will be a list of two param_blocks: one containing branchwise rate parameters named "R" and one containing the background rate named "bg_rate". If multiple node/edge groups are specified, this will be a list with a named element for each group. Each element will, in turn, contain a corresponding "R" and "bg_rate" param_block.

Examples

#get whale/dolphin evorates fit
data("cet_fit")

#get posterior samples of global background rate
bg.rate <- get.bg.rate(cet_fit)
#let's say we want to get the background rate for the genus Mesoplodon
bg.rate <- get.bg.rate(cet_fit, node.groups = "Mesoplodon")
#now let's say we want to get the background rate for some edges towards the root
plot(cet_fit$call$tree); edgelabels()
bg.rate <- get.bg.rate(cet_fit, edge.groups = c(1, 2, 9, 28, 29, 30))
#in the case of edge groups specifically, we can also do this to exclude edges
#does NOT work for node groups!
bg.rate <- get.bg.rate(cet_fit, edge.groups = -c(1, 2, 9, 28, 29, 30))
#we can also select specific samples
bg.rate <- get.bg.rate(cet_fit, edge.groups = c(1, 2, 9, 28, 29, 30), extra.select = c(1, 23, 47))
#here's a way get better names for the resulting parameters
#note that we can also combine multiple node/edge groups
bg.rate <- get.bg.rate(cet_fit,
                       node.groups = list("Mesoplodon" = "Mesoplodon"),
                       edge.groups = list("root_edges" = c(1, 2, 9, 28, 29, 30)))

#now let's look at a more practical example
#maybe we want to some rate differences among key clades?
#define clades containing Mesoplodon, orca, globicephalinae, and blue whale
node.groups <- list("Mesoplodon" = "Mesoplodon",
                    "Orcinus_orca" = "Orcinus",
                    "globicephalniae" = c("Pseudorca", "Feresa"),
                    "Balaenoptera_musculus" = "musculus")
#define edge group that includes all delphinidae EXCEPT for orca and globicephalinae
edge.groups <- exclude.clade(cet_fit$call$tree,
                             node1 = c("Feresa", "Orcinus"),
                             node2 = "Orcinus")
edge.groups <- exclude.clade(cet_fit$call$tree,
                             edge.group1 = edge.groups,
                            node2 = c("Pseudorca", "Feresa"))
edge.groups <- list("other_delphinidae" = edge.groups)
#now we get our nicely formatted parameters!
bg.rate <- get.bg.rate(cet_fit,
                       node.groups = node.groups,
                       edge.groups = edge.groups)

#could also look at quantiles, means, or diagnostics
med.bg.rate <- get.bg.rate(cet_fit,
                           node.groups = node.groups,
                           edge.groups = edge.groups,
                           type = "quantiles",
                           extra.select = 0.5)
mean.bg.rate <- get.bg.rate(cet_fit,
                            node.groups = node.groups,
                            edge.groups = edge.groups,
                            type = "means")
init.bg.rate <- get.bg.rate(cet_fit,
                            node.groups = node.groups,
                            edge.groups = edge.groups,
                            type = "diagnostics",
                            type = c("inits", "ess"))
                 
#here's an example of what happens when you don't simplify the result
med.bg.rate <- get.bg.rate(cet_fit,
                           node.groups = node.groups,
                           edge.groups = edge.groups,
                           type = "quantiles",
                           extra.select = 0.5,
                           simplify = FALSE)
                           
#note that, depending on your goals, it may be more efficient to do things like this instead
#just to avoid some redundant calculations
bg.rate <- get.bg.rate(cet_fit,
                       node.groups = node.groups,
                       edge.groups = edge.groups)
med.bg.rate <- bg.rate %quantiles% list(".", 0.5)
mean.bg.rate <- bg.rate %means% "."

#in some cases, it may be interesting to compare averaged rate deviations among groups
bg.rate <- get.bg.rate(cet_fit,
                       node.groups = node.groups,
                       edge.groups = edge.groups,
                       remove.trend = TRUE,
                       geometric = TRUE)
whole.bg <- get.bg.rate(cet_fit,
                        remove.trend = TRUE,
                        geometric = TRUE)
bg.rdev <- bg.rate - whole.bg
#we can even use this to get a posterior probability that, as a whole, a given clade is evolving anomalously fast
#given the trend in rates over time, at least, though we could shift this by changing the remove.trend and geometric arguments
post.probs <- (bg.rdev > 0) %means% "."
#or equivalently
post.probs <- (bg.rate > whole.bg) %means% "."

#as a demonstration, we can easily use these functions to recalculate branchwise rate deviations in general
tmp <- get.bg.rate(cet_fit,
                   remove.trend = TRUE,
                   geometric = TRUE,
                   keep.R = TRUE)
#note the change in format
rdevs <- tmp$R - tmp$bg_rate
c(rdevs %select% 1, cet_fit %chains% "^Rdev_1$") #exactly the same

#I don't necessarily recommend this, but you could even calculate rate deviations for subclades
#just a good demonstration of what happens when keep.R is TRUE
#though it doesn't make much sense for single lineages...
tmp <- get.bg.rate(cet_fit,
                   node.groups = node.groups,
                   edge.groups = edge.groups,
                   remove.trend = TRUE,
                   geometric = TRUE,
                   keep.R = TRUE)
#note the further change in format
sub.rdevs <- do.call(c, lapply(tmp, function(ii) ii$R - ii$bg_rate))

bstaggmartin/evorates documentation built on May 31, 2024, 5:56 a.m.