mvslouchModel | R Documentation |
The mvslouchModel
function uses maximum likelihood to fit parameters of a multivariate
OUBM model evolving on the phylogeny. The user is recommended to install the suggested package
PCMBaseCpp which significantly speeds up the calculations (see Details).
mvslouchModel(phyltree, mData, kY, regimes = NULL, regimes.times = NULL,
root.regime = NULL, predictors = NULL, M.error = NULL, Atype = "Invertible",
Syytype = "UpperTri", diagA = "Positive", estimate.root.state=FALSE,
parameter_signs=NULL, start_point_for_optim = NULL, parscale = NULL,
min_bl = 0.0003, maxiter = c(10,50,100), estimateBmethod="ML")
phyltree |
The phylogeny in |
mData |
A matrix with the rows corresponding to the tip species while the columns correspond to the traits.
The rows should be named by species |
kY |
Number of "Y" (response) variables.
The first |
regimes |
A vector or list of regimes. If vector then each entry corresponds to each of |
regimes.times |
A list of vectors for each tree node, it starts with 0 and ends with the current time
of the species. In between are the times where the regimes (niches) changed. If |
root.regime |
The regime at the root of the tree. If not given, then it is taken as the regime that is present
on the root's daughter lineages and is the most frequent one in the |
predictors |
A vector giving the numbers of the columns from |
M.error |
An optional measurement error covariance structure. The measurement errors between species are assumed independent. The program tries to recognize the structure of the passed matrix and accepts the following possibilities :
From version |
Atype |
What class does the A matrix in the multivariate OUBM model belong to, possible values :
|
Syytype |
What class does the Syy matrix in the multivariate OUBM model belong to, possible values :
|
diagA |
Whether the values on |
estimate.root.state |
Should the root state be estimate |
parameter_signs |
WARNING: ONLY use this option if you understand what you are doing! This option
is still in an experimental stage so some setups might not work (please report).
A list allowing the user to control whether specific entries for each model parameter
should be positive, negative, zero or set to a specific (other) value. The entries
of the list have to be named, the admissible names are |
start_point_for_optim |
A name list with starting parameters for of the parameters for be optimized by start_point_for_optim=list(A=rbind(c(2,0),(0,4)), Syy=rbind(c(1,0.5),c(0,2))). |
parscale |
A vector to calculate the |
min_bl |
Value to which PCMBase's |
maxiter |
The maximum number of iterations for different components of the estimation
algorithm. A vector of three integers. The first is the number of iterations for phylogenetic
GLS evaluations, i.e. conditional on the other parameters, the regime optima, |
estimateBmethod |
Should |
The likelihood calculations are done by the PCMBase package. However, there is a C++ backend, PCMBaseCpp. If it is not available, then the likelihood is calculated slower using pure R. However, with the calculations in C++ up to a 100-fold increase in speed is possible (more realistically 10-20 times). The PCMBaseCpp package is available from https://github.com/venelin/PCMBaseCpp.
This function estimates the parameters of the following multivariate SDE,
\begin{array}{rclccl}
dY(t) & = & -A(Y(t)-(\Psi(t)- A^{-1}BX(t)))dt + \Sigma_{yy} dB(t) & Y(0) & = & Y_{0},
\\ dX(t) & = & \Sigma_{xx} dB(t) & X(0) & = & X_{0}
\end{array}
on a phylogenetic tree. It uses a numerical optimization over A
(parametrized by its
eigenvalues and eigenvectors or its QR decomposition) and S
(parametrized by its values)
and conditional on A
and S
estimates the values of Psi
corresponding
to the different regimes by a GLS estimate. Y(0)
is assumed to be equal to
- solve(A)BX(0)
plus the root value of Psi
. This assumes that A
is invertible.
If not, then Y(0)
will be set at the root value of Psi
. This is unless
estimate.root.state=TRUE
, in such a case Y(0)
will be estimated by least squares.
The setting Atype="Any"
means that one assumes the matrix A
is eigendecomposable.
If the estimation algorithm hits a defective A
, then it sets the log-likelihood at
the minimum value and will try to get out of this dip.
The function parameter parameter_signs
is special in the sense that it can give
the user great control over the estimation procedure but can also make the output
very inconsistent with what the user provides. If we have two response traits (OU ones)
and two predictor traits (BM ones), then an EXAMPLE setting of this can be:
parameter_signs=list(signsA=rbind(c("+","-"),c(0,"+")),
signsSyy=rbind(c(NA,0),c(0,NA)), signsB=rbind(c(NA,0),c(0,NA)))
.
This means that A
is upper triangular with positive
values on the diagonal and a negative value on the off-diagonal, Syy
is diagonal
and B
is also diagonal. It is advisable to set now Atype="Any"
and
Syytype="Any"
(see further description).
If the given model parameter is to be estimated
by a generalized least squares (currently B
, mPsi
and vY0
), then the
sign specifications are ignored. However, it is possible to set specific values.
Furthermore, the package does not check (for A
and Syy
) if the specifications here agree with the Atype
, Syytype
and diagA
. The settings in signsA
and signsSyy
will override
the other settings. Hence, it is up to the user to make sure that the settings of
signsA
and signsSyy
are consistent with Atype
, Syytype
and diagA
. It is advisable to use signsA
with "+"
on the diagonal and have diagA=NULL
. The diagonal of Syy
is forced to
be positive (unless "-"
is used on the diagonal of signsSyy
but this is strongly discouraged) so it is advisable to keep NA
on the diagonal of signsSyy
and not put there "+"
there.
Hence, in particular using the signs mechanism result in a wrong class of the matrix
(e.g. Atype="SymmetricPositiveDefinite"
, but after corrections for the provided entries in
signsA
one obtains a non-symmetric A
with complex, negative-real-part eigenvalues).
Lastly, using signsA
and signsSyy
can result in
a wrong amount of dof
and in turn incorrect AICc
and BIC
values.
What the code does is subtracts the amount of fixed values in signsA
and signsSyy
from the amount of free parameters used to estimate A
and Syy
. For example
if one sets
Atype="SingleValueDiagonal"
(estimated by one free parameter)
but specified two off-diagonal values, then the amount of dofs from A
will be -1
!!
The ONLY fail-safe way to use this is to set Atype="Any"
(if signsA
used) and
Syytype="Any"
(if signsSyy
used). If using Syytype="Any"
and signsSyy
the it is strongly advisable to set the entries either below or above Syy
's diagonal to 0
.
The reason is that \Sigma_{yy}\Sigma_{yy}^{T}
enters the likelihood and not the
given value of \Sigma_{yy}
. Hence, having values below (or respectively above) the diagonal
results in an overparameterized model. The package has the option of mixing different matrix types
with specifying values in it but this is only for advanced users who need to dig into the code
to see what the dof
's should be and if it is possible to find a correspondence between the
parametrization and settings. If entries of mPsi
, vY0
and B
are pre-specified,
then the dof
are correctly adjusted for this. The estimation procedures currently ignore any
pre-specified values for vX0
and Sxx
!
The found point is described by a list containing four fields.
The first field
HeuristicSearchPointFinalFind
is the parametrization of the model
parameters at the considered
point with the value of the log–likelihood.
The field ParamsInModel
is the point estimate of the parameters of the SDE.
The field ParamSummary
are different composite (evaluated at the tree's height) and summary statistics,
The field phylhalflife
are the eigenvalues, eigenvectors and phylogenetic half lives
associated with the A
matrix of, expmtA
is exp(-A*(tree height))
,
optimal regression is the A^{-1}B
matrix (if A
is invertible, otherwise this will not exist),
mPsi.rotated
is each of the regime effects multiplied by 1-\exp(-A*(tree height))
,
cov.matrix
is the trait vector covariance matrix at the tree's height,
corr.matrix
is the trait vector correlation matrix at the tree's height,
conditional.cov.matrix
is the conditional covariance matrix of the OU type variables
on the Brownian motion type at the tree's height, i.e. Cov[Y|X](tree height),
conditional.corr.matrix
is the conditional correlation matrix of the OU type variables
on the Brownian motion type at the tree's height, i.e. Corr[Y|X](tree height),
stationary.cov.matrix
is the limit of the conditional.cov.matrix
,
stationary.corr.matrix
is the limit of the conditional.corr.matrix
,
optima.cov.matrix
is the covariance matrix of the optimal process at the tree's height
equalling
(tree height)* A^{-1}B\Sigma_{xx}\Sigma_{xx}^{T}B^{T}A^{-T}
,
optima.corr.matrix
is the correlation matrix of the optimal process at time the tree's height,
cov.with.optima
is the covariance matrix between the optimal process and the Y type variables process,
corr.with.optima
is the correlation matrix between the optimal process and the Y type variables process,
evolutionary.regression
is the regression coefficient of E[Y|X](tree height).
Everything concerning the optimal process assumes A has positive real-part eigenvalues (in particular
it is invertible). Otherwise these will not exist.
StS
is the infinitesimal covariance matrix,
LogLik
the log–likelihood, dof the degrees of freedom, m2loglik
is -2
log–likelihood,
aic
is the Akaike information criterion, aic.c
is the Akaike information criterion corrected for small
sample size, sic is the Schwarz information criterion, bic
is the Bayesian information criterion
(which is the same as the Schwarz information criterion) and RSS
is the residual sum of squares.
The field RSS_non_phylogenetic
is a residual sum of squares calculated without correcting
for the phylogeny–induced between species correlations, while the extension
conditional_on_predictors indicates that we consider the RSS for the variables
labelled as responses conditioned on the remaining variables. The R2_phylaverage
field is
R2, where the alternative model is the phylogenetically weighted sample average
(see OU_phylreg
).
The last field LogLik
is the log–likelihood at the point.
From version 2.0.0
of mvSLOUCH the data has to be passed as a matrix.
To underline this the data parameter's name has been changed to mData
.
From version 2.0.0
of mvSLOUCH the parameter calcCI
has been removed.
The package now offers the possibility of bootstrap confidence intervals, see
function parametric.bootstrap
.
FinalFound |
The point where the search procedure stopped. See Details for the description. |
MaxLikFound |
The point with the highest likelihood found by the search procedure, if it is the same as the final point then this field equals "Same as final found". |
The estimation can take a long time and should be repeated a couple of times so that it is run from different starting positions. The function can produce (a lot of) warnings and errors during the search procedure, this is nothing to worry about.
The slouch package is a recommended alternative if one has only a single response (Y) trait.
Krzysztof Bartoszek
Bartoszek, K. and Pienaar, J. and Mostad. P. and Andersson, S. and Hansen, T. F. (2012) A phylogenetic comparative method for studying multivariate adaptation. Journal of Theoretical Biology 314:204-215.
Butler, M.A. and A.A. King (2004) Phylogenetic comparative analysis: a modeling approach for adaptive evolution. American Naturalist 164:683-695.
Hansen, T.F. (1997) Stabilizing selection and the comparative analysis of adaptation. Evolution 51:1341-1351.
Hansen, T.F. and Bartoszek, K. (2012) Interpreting the evolutionary regression: the interplay between observational and biological errors in phylogenetic comparative studies. Systematic Biology 61(3):413-425.
Hansen, T.F. and Pienaar, J. and Orzack, S.H. (2008) A comparative method for studying adaptation to randomly evolving environment. Evolution 62:1965-1977.
Labra, A., Pienaar, J. & Hansen, T.F. (2009) Evolution of thermophysiology in Liolaemus lizards: adaptation, phylogenetic inertia and niche tracking. The American Naturalist 174:204-220.
Mitov, V. and Bartoszek, K. and Asimomitis, G. and Stadler, T. (2020) Fast likelihood calculation for multivariate Gaussian phylogenetic models with shifts Theoretical Population Biology 131:66-78.
Pienaar et al (in prep) An overview of comparative methods for testing adaptation to external environments.
PCMLik
, slouch::model.fit
, SummarizeMVSLOUCH
,
simulMVSLOUCHProcPhylTree
,
parametric.bootstrap
,
optim
RNGversion(min(as.character(getRversion()),"3.6.1"))
set.seed(12345, kind = "Mersenne-Twister", normal.kind = "Inversion")
### We will first simulate a small phylogenetic tree using functions from ape.
### For simulating the tree one could also use alternative functions, e.g. sim.bd.taxa
### from the TreeSim package
phyltree<-ape::rtree(3)
## The line below is not necessary but advisable for speed
phyltree<-phyltree_paths(phyltree)
## 2 regimes
### Define a vector of regimes.
## regimes<-c("small","small","large","small")
## OUBMparameters<-list(vY0=matrix(1,ncol=1,nrow=1),A=matrix(0.5,ncol=1,nrow=1),
## B=matrix(2,ncol=1,nrow=1),mPsi=cbind("small"=1,"large"=-1),
## Syy=matrix(2,ncol=1,nrow=1),vX0=matrix(0,ncol=1,nrow=1),Sxx=diag(2,1,1),
## Syx=matrix(0,ncol=1,nrow=1),Sxy=matrix(0,ncol=1,nrow=1))
## single regime for speed on CRAN
regimes<-c("small","small","small","small")
OUBMparameters<-list(vY0=matrix(1,ncol=1,nrow=1),A=matrix(0.5,ncol=1,nrow=1),
B=matrix(2,ncol=1,nrow=1),mPsi=cbind("small"=1),
Syy=matrix(2,ncol=1,nrow=1),vX0=matrix(0,ncol=1,nrow=1),Sxx=diag(2,1,1),
Syx=matrix(0,ncol=1,nrow=1),Sxy=matrix(0,ncol=1,nrow=1))
### Now simulate the data.
OUBMdata<-simulMVSLOUCHProcPhylTree(phyltree,OUBMparameters,regimes,NULL)
OUBMdata<-OUBMdata[phyltree$tip.label,,drop=FALSE]
### Try to recover the parameters of the mvOUBM model.
### maxiter here set to minimal working possibility, in reality it should be larger
### e.g. default of c(10,50,100)
### Also the Atype and Syytype variables should be changed, here set as simplest
### for speed of evaluation, e.g. Atype="DecomposablePositive", Syytype="UpperTri"
OUBMestim<-mvslouchModel(phyltree,OUBMdata,1,regimes,Atype="SingleValueDiagonal",
Syytype="SingleValueDiagonal",diagA="Positive",maxiter=c(1,2,1))
RNGversion(as.character(getRversion()))
## Not run: ##It takes too long to run this
## take a less trivial setup
phyltree<-ape::rtree(5)
## The line below is not necessary but advisable for speed
phyltree<-phyltree_paths(phyltree)
### Define a vector of regimes.
regimes<-c("small","small","large","small","small","large","large","large")
### Define SDE parameters to be able to simulate data under the mvOUBM model.
OUBMparameters<-list(vY0=matrix(c(1,-1),ncol=1,nrow=2),A=rbind(c(9,0),c(0,5)),
B=matrix(c(2,-2),ncol=1,nrow=2),mPsi=cbind("small"=c(1,-1),"large"=c(-1,1)),
Syy=rbind(c(1,0.25),c(0,1)),vX0=matrix(0,1,1),Sxx=matrix(1,1,1),
Syx=matrix(0,ncol=1,nrow=2),Sxy=matrix(0,ncol=2,nrow=1))
### Now simulate the data.
OUBMdata<-simulMVSLOUCHProcPhylTree(phyltree,OUBMparameters,regimes,NULL)
OUBMdata<-OUBMdata[phyltree$tip.label,,drop=FALSE]
### Try to recover the parameters of the mvOUBM model.
OUBMestim<-mvslouchModel(phyltree,OUBMdata,2,regimes,Atype="DecomposablePositive",
Syytype="UpperTri",diagA="Positive",maxiter=c(10,50,100))
### And finally bootstrap with particular interest in the evolutionary and optimal
### regressions
OUBMbootstrap<-parametric.bootstrap(estimated.model=OUBMestim,phyltree=phyltree,
values.to.bootstrap=c("evolutionary.regression","optimal.regression"),
regimes=regimes,root.regime="small",M.error=NULL,predictors=c(3),kY=2,
numboot=5,Atype="DecomposablePositive",Syytype="UpperTri",diagA="Positive")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.