slouch.fit: Function to fit Ornstein-Uhlenbeck models of trait evolution

View source: R/interface.R

slouch.fitR Documentation

Function to fit Ornstein-Uhlenbeck models of trait evolution

Description

Function to fit Ornstein-Uhlenbeck models of trait evolution

Usage

slouch.fit(
  phy,
  species = NULL,
  hl_values = NULL,
  a_values = NULL,
  vy_values = NULL,
  sigma2_y_values = NULL,
  response,
  mv.response = NULL,
  fixed.fact = NULL,
  direct.cov = NULL,
  mv.direct.cov = NULL,
  mcov.direct.cov = NULL,
  random.cov = NULL,
  mv.random.cov = NULL,
  mcov.random.cov = NULL,
  ace = NULL,
  anc_maps = "regimes",
  estimate.Ya = FALSE,
  estimate.bXa = FALSE,
  interactions = FALSE,
  hessian = FALSE,
  support = 2,
  convergence = 1e-06,
  nCores = 1,
  hillclimb = TRUE,
  lower = c(1e-08, 1e-08),
  upper = Inf,
  verbose = FALSE
)

Arguments

phy

an object of class 'phylo', must be rooted.

species

a character vector of species tip labels, typically the "species" column in a data frame. This column needs to be an exact match and same order as phy$tip.label

hl_values

a vector of candidate phylogenetic half-life values to be evaluated in grid search. Optional.

a_values

a vector of candidate rate of adaptation values to be evaluated in grid search. Optional.

vy_values

a vector of candidate stationary variances for the response trait, to be evaluated in grid search. Optional.

sigma2_y_values

alternative to vy_values, if the stationary variance is reparameterized as the variance parameter for the Brownian motion.

response

a numeric vector of a trait to be treated as response variable

mv.response

numeric vector of the observational variances of each response trait. E.g if response is a mean trait value, mv.response is the within-species squared standard error of the mean.

fixed.fact

factor of regimes on the terminal edges of the tree, in same order as species. If this is used, phy$node.label needs to be filled with the corresponding internal node regimes, in the order of node indices (root: n+1),(n+2),(n+3), ...

direct.cov

Direct effect independent variables

mv.direct.cov

Estimation variances for direct effect independent variables. Must be the same shape as direct.cov

mcov.direct.cov

Estimation covariances between the response variable and direct effect independent variables. Most be the same shape as direct.cov

random.cov

Independent variables each modeled as a brownian motion

mv.random.cov

Estimation variances for the brownian covariates. Must be the same shape as random.cov

mcov.random.cov

Estimation covariances between the response variable and random effect independent variables. Most be the same shape as random.cov

ace

An ape::ace object, with estimated ancestral character states. Optional

anc_maps

One of "regimes", "ace" or "simmap". "regimes" tells slouch to use 'phy$node.label' to assign internal regimes. "ace" tells slouch to use ancestral posterior probabilities for ancestral regimes. "simmap" tells slouch to use the simmap mappings associated with 'phy'

estimate.Ya

a logical value indicathing whether "Ya" should be estimated. If true, the intercept K = 1 is expanded to Ya = exp(-a*t) and b0 = 1-exp(-a*t). If models with categorical covariates are used, this will instead estimate a separate primary optimum for the root niche, "Ya". This only makes sense for non-ultrametric trees. If the tree is ultrametric, the model matrix becomes singular.

estimate.bXa

a logical value indicathing whether "bXa" should be estimated. If true, bXa = 1-exp(-a*t) - (1-(1-exp(-a*t))/(a*t)) is added to the model matrix, estimating b*Xa. Same requirements as for estimating Ya.

interactions

a logical value. Whether to model interactions between (all) direct-effect continuous covariates and categorical regimes (experimental). Defaults to FALSE

hessian

use the approximate hessian matrix at the likelihood peak as found by the hillclimber, to compute standard errors for the parameters that enter in parameter search.

support

a scalar indicating the size of the support set, defaults to 2 units of log-likelihood.

convergence

threshold of iterative GLS estimation for when beta is considered to be converged.

nCores

number of CPU cores used in grid-search. If 2 or more cores are used, all print statements are silenced during grid search. If performance is critical it is recommended to compile and link R to a multithreaded BLAS, since most of the heavy computations are common matrix operations. Even if a singlethreaded BLAS is used, this may or may not improve performance, and performance may vary with OS.

hillclimb

logical, whether to use hillclimb parameter estimation routine or not. This routine (L-BFGS-B from optim()) may be combined with the grid-search, in which case it will on default start on the sigma and halflife for the local ML found by the grid-search.

lower

lower bounds for the optimization routine, defaults to c(0,0). First entry in vector is half-life, second is stationary variance. When running direct effect models without observational error, it may be useful to specify a positive lower bounds for the stationary variance, e.g c(0, 0.001), since the residual variance-covariance matrix is degenerate when sigma = 0.

upper

upper bounds for the optimization routine, defaults to c(Inf, Inf).

verbose

a logical value indicating whether to print a summary in each iteration of parameter search. May be useful when diagnosing unexpected behaviour or crashes.

Value

An object of class 'slouch', essentially a list with the following fields:

parameter_space

a list of the entire parameter space traversed by the grid search and the hillclimber as applicable.

tree

a list of parameters concerning the tree:

  • phy - an object of class 'phy'

  • T.term - a numeric vector including the time from the root of the tree to the tip, for all taxa 1,2,3... n.

  • ta - for all pairs of species, the time from their most recent common ancestor (mrca) to the root of the tree.

  • tia - for all pairs of species, the time from their mrca to the tip of species i.

  • tja - the transpose of tia.

  • tij - for all pairs of species, the time from species i to their mrca, plus the time from their mrca to species j. In other words, tia + transpose(tia).

  • times - for all nodes (1,2,3... n, root, root+1, ...) in the tree, the time from the root to said node.

  • lineages - for all species (1,2,3... n), a list of their branch times and regimes as painted on the tree.

  • regimes - for all nodes (1,2,3... n, root, root+1, ...) in the tree, the respective regime as specified by "phy$node.label" and "fixed.fact".

modfit

a list of statistics to characterize model fit

supportplot

a list or matrix used to plot the grid search

supported_range

a matrix indicating the interval of grid search that is within the support region. If the grid search values are carefully selected, this may be used to estimate the true support region.

V

the residual variance-covariance matrix for the maximum likelihood model as found by parameter search.

evolpar

maximum likelihood estimates of parameters under the chosen model.

beta_primary

regression coefficients and associated objects. Whether the regression coefficients are to be interpreted as optima or not depend on the type of model and model estimates.

beta_evolutionary

under a random effect model, "beta_evolutionary" is the evolutionary regression coefficients and associated objects.

n.par

number of free parameters with which the likelihood criteria are penalized.

brownian_predictors

under a random effect model, a matrix of means and standard errors for the independent Brownian motion variable(s). Not to be confused with the regression coefficients when the residuals are under a "bm" model.

climblog_df

a matrix of the path trajectory of the hillclimber routine.

fixed.fact

the respective regimes for all species (1,2,3... n).

control

internal parameters for control flow.


kopperud/slouch documentation built on Feb. 17, 2024, 10:31 a.m.