estimate.evolutionary.model | R Documentation |
The estimate.evolutionary.model
function calls the
BrownianMotionModel
, ouchModel
and mvslouchModel
functions with different classes of evolutionary model parameters.
It then compares the resulting estimates by the AICc (or BIC if AICc fails) and
returns the best overall model. The user is recommended to install the suggested package
PCMBaseCpp which significantly speeds up the calculations (see Details).
estimate.evolutionary.model(phyltree, mData, regimes = NULL,
root.regime = NULL, M.error = NULL, repeats = 5, model.setups = NULL,
predictors = NULL, kY = NULL, doPrint = FALSE, pESS=NULL,
estimate.root.state=FALSE, min_bl = 0.0003, maxiter=c(10,50,100))
phyltree |
The phylogeny in |
mData |
A matrix with the rows corresponding to the tip species while the columns correspond to the traits.
The rows should be named by species |
regimes |
A vector or list of regimes. If vector then each entry corresponds to each of |
root.regime |
The regime at the root of the tree. If not given, then it is taken as the regime that is present
on the root's daughter lineages and is the most frequent one in the |
M.error |
An optional measurement error covariance structure. The measurement errors between species are assumed independent. The program tries to recognize the structure of the passed matrix and accepts the following possibilities :
From version |
repeats |
How many starting points for the numerical maximum likelihood procedure should be tried for
each model setup. On the first repeat for OUOU and OUBM modes the functions takes as the starting
point (for |
model.setups |
What models to try when searching for the best evolutionary model.
This field may remain
Alternatively the user is also free to provide their own list of models in this variable. Each element of the list is a list with fields.
A minimum example list is |
predictors |
A vector giving the numbers of the columns from
|
kY |
Number of "Y" (response) variables, for the OUBM models.
The first |
doPrint |
Should the function print out information on what it is doing (TRUE) or keep silent (default FALSE). |
pESS |
Should the function also find the best model taking into account the phylogenetic effective sample size
and it so what method. If |
estimate.root.state |
Should the root state be estimate |
min_bl |
Value to which PCMBase's |
maxiter |
The maximum number of iterations for different components of the estimation
algorithm. A vector of three integers. The first is the number of iterations for phylogenetic
GLS evaluations, i.e. conditional on the other parameters, the regime optima, perhaps |
The likelihood calculations are done by the PCMBase package. However, there is a C++ backend, PCMBaseCpp. If it is not available, then the likelihood is calculated slower using pure R. However, with the calculations in C++ up to a 100-fold increase in speed is possible (more realistically 10-20 times). The PCMBaseCpp package is available from https://github.com/venelin/PCMBaseCpp.
The setting Atype="Any"
means that one assumes the matrix A
is eigendecomposable.
If the estimation algorithm hits a defective A
, then it sets the log-likelihood at
the minimum value and will try to get out of this dip.
If model.setups
is left at the default value the
function will take a long time to run, as it performs estimation for each model
(generate.model.setups
generates 90 setups) times the value in repeats.
Therefore if the user has particular hypotheses in mind then it is advisable to
prepare their own list.
If the Syy
matrix is assumed to be upper-triangular and the starting conditions
based on Bartoszek \&
Sagitov (2015)'s results are used then the factorization
of \Sigma=\Sigma_{yy} \Sigma_{yy}^{T}
into \Sigma_{yy}
is
done using the procedure described in
https://math.stackexchange.com/questions/2039477/cholesky-decompostion-upper-triangular-or-lower-triangular.
From version 2.0.0
of mvSLOUCH the data has to be passed as a matrix.
To underline this the data parameter's name has been changed to mData
.
If AICc fails, then the function will use BIC to select between models. This is extremely unlikely essentially only when AICc is infinite, i.e. the model is saturated (number of observations equals number of data points).
A list is returned that describes the results of the search. See the help for
BrownianMotionModel
, ouchModel
and
mvslouchModel
for the description of the lower level entries.
The elements of this list are the following
BestModel |
The resulting best model found. Included are the model parameters, a "first-glance" qualitative description of the model, the most important parameters of the process (half-lives and regressions in the case of OU models) and what to call to obtain standard errors. It takes a long time to obtain them so calculating them is not part of the standard procedure. |
BestModelESS |
Only if |
testedModels |
A list of results for each tried model. |
model.setups |
A list of models tried. |
repeats |
How many starting points were tried per model. |
The engine behind the likelihood calculations is called from PCMBase.
The slouch
package is a recommended alternative if one has a OUBM models and
only a single response (Y) trait.
The mvMORPH
, ouch
and Rphylpars
packages consider multivariate OU models
and looking at them could be helpful.
Krzysztof Bartoszek
Ane, C. (2008) Analysis of comparative data with hierarchical autocorrelation. Annals of Applied Statistics 2:1078-1102.
Bartoszek, K. (2016) Phylogenetic effective sample size. Journal of Theoretical Biology 407:371-386.
Bartoszek, K. and Fuentes-Gonzalez, J. and Mitov, V. and Pienaar, J. and Piwczynski, M. and Puchalka, R. and Spalik, K. and Voje, K. L. (2023) Model Selection Performance in Phylogenetic Comparative Methods Under Multivariate Ornstein-Uhlenbeck Models of Trait Evolution, Systematic Biology 72(2):275-293.
Bartoszek, K. and Pienaar, J. and Mostad. P. and Andersson, S. and Hansen, T. F. (2012) A phylogenetic comparative method for studying multivariate adaptation. Journal of Theoretical Biology 314:204-215.
Bartoszek, K. and Sagitov, S. (2015) Phylogenetic confidence intervals for the optimal trait value. Journal of Applied Probability 52(4):1115-1132.
Butler, M.A. and A.A. King (2004) Phylogenetic comparative analysis: a modeling approach for adaptive evolution. American Naturalist 164:683-695.
Hansen, T.F. and Pienaar, J. and Orzack, S.H. (2008) A comparative method for studying adaptation to randomly evolving environment. Evolution 62:1965-1977.
Mitov, V. and Bartoszek, K. and Asimomitis, G. and Stadler, T. (2020) Fast likelihood calculation for multivariate Gaussian phylogenetic models with shifts Theoretical Population Biology 131:66-78.
Xiao, H and Bartoszek, K. and Lio P. (2018) Multi–omic analysis of signalling factors in inflammatory comorbidities. BMC Bioinformatics, Proceedings from the 12th International BBCC conference 19:439.
brown
, mvBM
BrownianMotionModel
, SummarizeBM
,
simulBMProcPhylTree
, hansen
, mvOU
,
ouchModel
,
SummarizeOUCH
, simulOUCHProcPhylTree
,
slouch::model.fit
, PCMLik
,
mvslouchModel
,
SummarizeMVSLOUCH
, simulMVSLOUCHProcPhylTree
,
parametric.bootstrap
,
optim
RNGversion(min(as.character(getRversion()),"3.6.1"))
set.seed(12345, kind = "Mersenne-Twister", normal.kind = "Inversion")
### We will first simulate a small phylogenetic tree using functions from ape.
### For simulating the tree one could also use alternative functions, e.g. sim.bd.taxa
### from the TreeSim package
phyltree<-ape::rtree(4)
## The line below is not necessary but advisable for speed
phyltree<-phyltree_paths(phyltree)
### Define a vector of regimes.
regimes<-c("small","small","large","small","large","small")
### Define SDE parameters to be able to simulate data under the OUOU model.
OUOUparameters<-list(vY0=matrix(c(1,-1,0.5),nrow=3,ncol=1),
A=rbind(c(9,0,0),c(0,5,0),c(0,0,1)),mPsi=cbind("small"=c(1,-1,0.5),"large"=c(-1,1,0.5)),
Syy=rbind(c(1,0.25,0.3),c(0,1,0.2),c(0,0,1)))
### Now simulate the data.
OUOUdata<-simulOUCHProcPhylTree(phyltree,OUOUparameters,regimes,NULL)
OUOUdata<-OUOUdata[phyltree$tip.label,,drop=FALSE]
## set up for a trivial, single model setup case (for running time)
## in a real analysis you should carefully choose between what models
## you want to do model selection
model_setups<-list(list(evolmodel="bm"))
### Try to recover the parameters of the OUOU model.
### maxiter here set to minimal working possibility, in reality it should be larger
### e.g. default of c(10,50,100)
estimResults<-estimate.evolutionary.model(phyltree,OUOUdata,regimes=regimes,
root.regime="small",M.error=NULL,repeats=1,model.setups=model_setups,predictors=c(3),
kY=2,doPrint=TRUE,pESS=NULL,maxiter=c(1,1,1))
### After this step you can look at the best estimated model and use the
### parametric.bootstrap() function to obtain bootstrap confidence intervals
RNGversion(as.character(getRversion()))
## Not run: ##It takes too long to run this
## take a less trivial setup
phyltree<-ape::rtree(5)
## The line below is not necessary but advisable for speed
phyltree<-phyltree_paths(phyltree)
### Define a vector of regimes.
regimes<-c("small","small","large","small","small","large","large","large")
### Define SDE parameters to be able to simulate data under the OUOU model.
OUOUparameters<-list(vY0=matrix(c(1,-1,0.5),nrow=3,ncol=1),
A=rbind(c(9,0,0),c(0,5,0),c(0,0,1)),mPsi=cbind("small"=c(1,-1,0.5),"large"=c(-1,1,0.5)),
Syy=rbind(c(1,0.25,0.3),c(0,1,0.2),c(0,0,1)))
### Now simulate the data.
OUOUdata<-simulOUCHProcPhylTree(phyltree,OUOUparameters,regimes,NULL)
OUOUdata<-OUOUdata[phyltree$tip.label,,drop=FALSE]
## set up for two very simple (for example usage) models to compare between
## in a real analysis you should carefully choose between what models
## you want to do model selection, the default
## model_setups<-NULL provides a wide selection of models
model_setups<-list(list(evolmodel="bm"),list(evolmodel="ouch",
"Atype"="SingleValueDiagonal","Syytype"="SingleValueDiagonal","diagA"="Positive"))
### Try to recover the parameters of the OUOU model.
estimResults<-estimate.evolutionary.model(phyltree,OUOUdata,regimes=regimes,
root.regime="small",M.error=NULL,repeats=3,model.setups=model_setups,predictors=c(3),
kY=2,doPrint=TRUE,pESS=NULL,maxiter=c(10,50,100))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.