OU_phylreg: Performs a phylogenetic regression under a given OU model of...

View source: R/OUphylregression.R

OU_phylregR Documentation

Performs a phylogenetic regression under a given OU model of evolution


The OU_phylreg function does a phylogenetic regression for given response and design matrices under a multivariate OU model evolving on the phylogeny. The user is recommended to install the suggested package PCMBaseCpp which significantly speeds up the calculations (see Details).


OU_phylreg(mY, mD, phyltree, modelParams, regimes = NULL, kY = NULL, M.error = NULL,
signif_level = 0.05, regimes.times = NULL, root.regime = NULL, b_GLSB = FALSE, 
b_GLSX0 = FALSE, signsB = NULL, signsvX0 = NULL, estimate.root.state = FALSE)



A matrix with the rows corresponding to the tip species while the columns correspond to the traits. The rows should be named by species
(field phyltree$tip.label), if not, then a warning is thrown and the order of the species is assumed to be the same as the order in which the species are in the phylogeny (i.e. correspond to the node indices 1:n, where n is the number of tips). The columns should be named by traits, otherwise a warning is thrown and generic names are generated.


A design matrix with the rows corresponding to the traits in the tips species while the columns correspond to the unknown regression variables. The number or rows have to correspond to the number of elements in mY, the data are assumed to be stacked by species. If NA it is assumed to be the design matrix to estimate regression parameters under the given model of evolution, see Details. If it is "phylaverage", then a phylogenetically weighted average is calculated, see Details.


The phylogeny in phylo format. The tree can be obtained from e.g. a nexus file by the read.nexus() function from the ape package. The "standard" ape node indexing is assumed: for a tree with n tips, the tips should have indices 1:n and the root index n+1. The root.edge field is ignored.


List of model parameters of the BM/OUOU/OUBM model as ParamsInModel part of output of BrownianMotionModel/ouchModel/mvslouchModel. Some of them can be NA in order to be estimated by the regression procedure, see Details.


A vector or list of regimes. If vector then each entry corresponds to each of the branches of phyltree, i.e. to each row of phyltree$edge. If list then each list entry corresponds to a tip node and is a vector for regimes on that lineage. If NULL, then a constant regime is assumed on the whole tree.


Number of "Y" (response) variables if the considered model is an OUBM one. The first kY columns of mY are the "OU" ones, while the rest the "BM" ones. In more detail this value determines the number of columns of the mY matrix to treat as response variables ("OU" ones). For example, a value of 1 means that only the first column is treated as a response variable, while a value of 3 means the first three columns are treated as response variables. Any predictor variables ("BM" ones) the user is interested in setting for a particular model should therefore be placed in the final columns of the mY matrix, allowing for selecting select kY columns before this as response variables ("OU" ones).


An optional measurement error covariance structure. The measurement errors between species are assumed independent. The program tries to recognize the structure of the passed matrix and accepts the following possibilities :

  • a single number that is a common measurement error for all tips and species,

  • a m element vector with each value corresponding to a variable, measurement errors are independent between variables and each species is assumed to have the same measurement errors,

  • a m x m ((number of variables) x (number of variables)) matrix, all species will have the same measurement error,

  • a list of length n (number of species), each list element is the covariance structure for the appropriate (numbering according to tree) species, either a single number (each variable has same variance), vector (of length m for each variable), or m x m matrix, the order of the list has to correspond to the order of the nodes in the phyltree object,

  • NULL no measurement error.

From version 2.0.0 of mvSLOUCH it is impossible to pass a single joint measurement error matrix for all the species and traits.


The significance level to be taken when calculating regression confidence intervals, i.e. (1-signif_level)\cdot 100 percent confidence intervals are returned.


A list of vectors for each tree node, it starts with 0 and ends with the current time of the species. In between are the times where the regimes (niches) changed. If NULL then each branch is considered to be a regime.


The regime at the root of the tree. If not given, then it is taken as the regime that is present on the daughter lineages stemming from the root and is the most frequent one in the regimes vector. If more than one regime has the same maximum frequency, then alphabetically first one of the maximum ones is taken.


If the evolutionary model is an OUBM one (and mD is NA), then should the B matrix be also estimated. If B is completely provided in modelParams, then that value is taken as an initial guess for the regression estimation procedure (as the between-species-between-traits variance-covariance matrix depends on B).


If the evolutionary model is an OUBM or BM one (and mD is NA), then should the X_{0} ancestral vector also be estimated. If X_{0} is completely provided in modelParams, then that value is taken as an initial guess for the regression estimation procedure (as in the OUBM the design matrix depends on vX0, in the BM case this value is ignored).


A matrix of constraints on the estimation of B, with the same dimensions as B, if b_GLBS is TRUE. Inside this matrix the possible values are "+" if the given entry is to be positive, "-" if the given entry is to be negative, x, where x is a number, if the entry is to be set to specified value or NA if the entry is to be freely estimated. This option is still in an experimental stage so some setups might not work (please report).


A matrix of constraints on the estimation of vX0, with the same dimensions as vX0, if b_GLBX0 is TRUE. Inside this matrix the possible values are "+" if the given entry is to be positive, "-" if the given entry is to be negative, x, where x is a number, if the entry is to be set to specified value or NA if the entry is to be freely estimated. This option is still in an experimental stage so some setups might not work (please report).


Should the root state be estimate TRUE (not recommended) or set at the optimum FALSE (recommended). Root state estimation is usually unreliable hence if fossil measurements are available prediction based on them and the estimated model will probably be more accurate. If there is only one regime, then estimation of the root state separately is impossible and will not be allowed.


The matrix algebra calculations are done using the likelihood function offerred by the PCMBase package. However, there is a C++ backend, PCMBaseCpp. If it is not available, then the likelihood is calculated slower using pure R. However, with the calculations in C++ up to a 100-fold increase in speed is possible (more realistically 10-20 times). The PCMBaseCpp package is available from https://github.com/venelin/PCMBaseCpp.

For a given input data matrix, mY, the function considers the stacking of it by rows (i.e. stacking species by species). Let Y = vec(mY), i.e. Y<-c(t(mY)), V be the between-species-between-traits variance-covariance matrix (under the parameters passed in modelParams). The function calculates the value of the generalized least squares estimator (not directly, but as a transformation of the likelihood provided by PCMBase)


The user can provide the design matrix directly or if mD is NA, then the design matrix induced by the evolutionary model in modelParams is assumed. The following parameters can be estimated: vX0 (if b_GLSX0 is TRUE, BM model); mPsi, vY0 (if estimate.root.state is TRUE, otherwise set at optimum) for OUOU model; vX0 (if b_GLSX0 is TRUE) mPsi, vY0 (if estimate.root.state is TRUE, otherwise set at optimum), B (if b_GLSB is TRUE). One can constrain (some of) the elements of the matrices to be estimated to be postive, negative or equal to some value. For B and vX0 this was described in the description of the arguments of signsB and signsvX0. For mPsi and vY0 one does this in the respective entries of modelParams. There matrix entries can be set to "+", "-", NA or some specific value. In the OUBM case the model specfic design matrix is not derived from the conditional expectation of all of the responses on all of the predictors, but from the conditional expectations of each tip species independently (as if V were block diagonal). This is as the joint condtional expecation design matrix cannot be calculated at the moment in an efficient manner and would cause a serious computational bottleneck. However this only makes a difference if B is to be estimated inside the GLS.

Special support is given if one wants to compute a phylogenetically weighted mean. If mD is set to "phylaverage", then it is calculated as

D_p=1_{n}\otimes Id_{k},

where 1_{n} is a column vector of n ones and Id_{k} is the identity matrix with rows and columns equalling the number or columns of mY.


A list with the following entries


The regression estimates


The covariance matrix between regression estimates.


The confidence intervals for each estimated parameter.


The model parameters updated if anything was estimated from them in the procedure.


The used or calculated design matrix.


The residual sum of squares.


R2, where the alternative model is the sample average.


R2, where the alternative model is the phylogenetically weighted sample average, i.e. the design matrix is D_p .


The RSS with respect to the sample average.


The RSS with respect to the phylogenetically weighted sample average.


The phylogeny used, returned as in the estimation procedure some additional fields are calculated. This could help in a speed up if the OU_phylreg is used in some iterative procedure.


Krzysztof Bartoszek


Bartoszek, K. and Pienaar, J. and Mostad. P. and Andersson, S. and Hansen, T. F. (2012) A phylogenetic comparative method for studying multivariate adaptation. Journal of Theoretical Biology 314:204-215.

Hansen, T.F. and Bartoszek, K. (2012) Interpreting the evolutionary regression: the interplay between observational and biological errors in phylogenetic comparative studies. Systematic Biology 61(3):413-425.


set.seed(12345, kind = "Mersenne-Twister", normal.kind = "Inversion")
### We will first simulate a small phylogenetic tree using functions from ape.
### For simulating the tree one could also use alternative functions, e.g. sim.bd.taxa 
### from the TreeSim package

## The line below is not necessary but advisable for speed

### Define a vector of regimes.

### Define SDE parameters to be able to simulate data under the OUOU model.
## 3D model
## OUOUparameters<-list(vY0=matrix(c(1,-1,0.5),nrow=3,ncol=1),
## A=rbind(c(9,0,0),c(0,5,0),c(0,0,1)),mPsi=cbind("small"=c(1,-1,0.5),"large"=c(-1,1,0.5)),
## Syy=rbind(c(1,0.25,0.3),c(0,1,0.2),c(0,0,1)))
## 2D model used to reduce running time on CRAN

### Now simulate the data.

## estimate parameters under OUOU model
OU_phylreg(OUOUdata, NA, phyltree, OUOUparameters_reg, regimes=regimes, 
kY=NULL, M.error=NULL)

mvSLOUCH documentation built on Nov. 21, 2023, 1:08 a.m.