fit_bm_model | R Documentation |
Given a rooted phylogenetic tree and states of one or more continuous (numeric) traits on the tree's tips, fit a multivariate Brownian motion model of correlated co-evolution of these traits. This estimates a single diffusivity matrix, which describes the variance-covariance structure of each trait's random walk. The model assumes a fixed diffusivity matrix on the entire tree. Optionally, multiple trees can be used as input, under the assumption that the trait evolved on each tree according to the same BM model.
fit_bm_model( trees,
tip_states,
isotropic = FALSE,
Nbootstraps = 0,
check_input = TRUE)
trees |
Either a single rooted tree or a list of rooted trees, of class "phylo". The root of each tree is assumed to be the unique node with no incoming edge. Edge lengths are assumed to represent time intervals or a similarly interpretable phylogenetic distance. |
tip_states |
Numeric state of each trait at each tip in each tree. If |
isotropic |
Logical, specifying whether diffusion should be assumed to be isotropic (i.e., independent of the direction). Hence, if |
Nbootstraps |
Integer, specifying the number of parametric bootstraps to perform for calculating the confidence intervals. If <=0, no bootstrapping is performed. |
check_input |
Logical, specifying whether to perform some basic checks on the validity of the input data. If you are certain that your input data are valid, you can set this to |
The BM model is defined by the stochastic differential equation
dX = \sigma \cdot dW
where W
is a multidimensional Wiener process with Ndegrees independent components and \sigma
is a matrix of size Ntraits x Ndegrees, sometimes known as "volatility" or "instantaneous variance". Alternatively, the same model can be defined as a Fokker-Planck equation for the probability density \rho
:
\frac{\partial \rho}{\partial t} = \sum_{i,j}D_{ij}\frac{\partial^2\rho}{\partial x_i\partial x_j}.
The matrix D
is referred to as the diffusivity matrix (or diffusion tensor), and 2D=\sigma\cdot\sigma^T
. Note that in the multidimensional case \sigma
can be obtained from D
by means of a Cholesky decomposition; in the scalar case we have simply \sigma=\sqrt{2D}
.
The function uses phylogenetic independent contrasts (Felsenstein, 1985) to retrieve independent increments of the multivariate random walk. The diffusivity matrix D
is then fitted using maximum-likelihood on the intrinsic geometry of positive-definite matrices. If multiple trees are provided as input, then independent contrasts are extracted from all trees and combined into a single set of independent contrasts (i.e., as if they had been extracted from a single tree).
If tree$edge.length
is missing, each edge in the tree is assumed to have length 1. The tree may include multifurcations (i.e. nodes with more than 2 children) as well as monofurcations (i.e. nodes with only one child). Note that multifurcations are internally expanded to bifurcations, prior to model fitting.
A list with the following elements:
success |
Logical, indicating whether the fitting was successful. If |
diffusivity |
Either a single non-negative number (if |
loglikelihood |
The log-likelihood of the fitted model, given the provided tip states data. |
AIC |
The AIC (Akaike Information Criterion) of the fitted model. |
BIC |
The BIC (Bayesian Information Criterion) of the fitted model. |
Ncontrasts |
Integer, number of independent contrasts used to estimate the diffusivity. This corresponds to the number of independent data points used. |
standard_errors |
Either a single numeric or a 2D numeric matrix of size Ntraits x Ntraits, listing the estimated standard errors of the estimated diffusivity, based on parametric bootstrapping. Only returned if |
CI50lower |
Either a single numeric or a 2D numeric matrix of size Ntraits x Ntraits, listing the lower bounds of the 50% confidence interval for the estimated diffusivity (25-75% percentile), based on parametric bootstrapping. Only returned if |
CI50upper |
Either a single numeric or a 2D numeric matrix of size Ntraits x Ntraits, listing the upper bound of the 50% confidence interval for the estimated diffusivity, based on parametric bootstrapping. Only returned if |
CI95lower |
Either a single numeric or a 2D numeric matrix of size Ntraits x Ntraits, listing the lower bound of the 95% confidence interval for the estimated diffusivity (2.5-97.5% percentile), based on parametric bootstrapping. Only returned if |
CI95upper |
Either a single numeric or a 2D numeric matrix of size Ntraits x Ntraits, listing the upper bound of the 95% confidence interval for the estimated diffusivity, based on parametric bootstrapping. Only returned if |
consistency |
Numeric between 0 and 1, estimated consistency of the data with the fitted model. If |
Stilianos Louca
J. Felsenstein (1985). Phylogenies and the Comparative Method. The American Naturalist. 125:1-15.
A. Lindholm, D. Zachariah, P. Stoica, T. B. Schoen (2019). Data consistency approach to model validation. IEEE Access. 7:59788-59796.
simulate_bm_model
,
get_independent_contrasts
# generate a random tree
tree = generate_random_tree(list(birth_rate_intercept=1), 10000)$tree
# Example 1: Scalar case
# - - - - - - - - - - - - - -
# simulate scalar continuous trait on the tree
D = 1
tip_states = simulate_bm_model(tree, diffusivity=D)$tip_states
# estimate original diffusivity from the generated data
fit = fit_bm_model(tree, tip_states)
cat(sprintf("True D=%g, fitted D=%g\n",D,fit$diffusivity))
# Example 2: Multivariate case
# - - - - - - - - - - - - - - -
# simulate vector-valued continuous trait on the tree
D = get_random_diffusivity_matrix(Ntraits=5)
tip_states = simulate_bm_model(tree, diffusivity=D)$tip_states
# estimate original diffusivity matrix from the generated data
fit = fit_bm_model(tree, tip_states)
# compare true and fitted diffusivity matrices
cat("True D:\n"); print(D)
cat("Fitted D:\n"); print(fit$diffusivity)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.