ace: Ancestral Character Estimation

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/ace.R

Description

This function estimates ancestral character states, and the associated uncertainty, for continuous and discrete characters.

logLik, deviance, and AIC are generic functions used to extract the log-likelihood, the deviance, or the Akaike information criterion of a fitted object. If no such values are available, NULL is returned.

anova is another generic function which is used to compare nested models: the significance of the additional parameter(s) is tested with likelihood ratio tests. You must ensure that the models are effectively nested (if they are not, the results will be meaningless). It is better to list the models from the smallest to the largest.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
ace(x, phy, type = "continuous", method = if (type == "continuous")
   "REML" else "ML", CI = TRUE,
    model = if (type == "continuous") "BM" else "ER",
    scaled = TRUE, kappa = 1, corStruct = NULL, ip = 0.1,
    use.expm = FALSE, use.eigen = TRUE, marginal = FALSE)
## S3 method for class 'ace'
print(x, digits = 4, ...)
## S3 method for class 'ace'
logLik(object, ...)
## S3 method for class 'ace'
deviance(object, ...)
## S3 method for class 'ace'
AIC(object, ..., k = 2)
## S3 method for class 'ace'
anova(object, ...)

Arguments

x

a vector or a factor; an object of class "ace" in the case of print.

phy

an object of class "phylo".

type

the variable type; either "continuous" or "discrete" (or an abbreviation of these).

method

a character specifying the method used for estimation. Four choices are possible: "ML", "REML", "pic", or "GLS".

CI

a logical specifying whether to return the 95% confidence intervals of the ancestral state estimates (for continuous characters) or the likelihood of the different states (for discrete ones).

model

a character specifying the model (ignored if method = "GLS"), or a numeric matrix if type = "discrete" (see details).

scaled

a logical specifying whether to scale the contrast estimate (used only if method = "pic").

kappa

a positive value giving the exponent transformation of the branch lengths (see details).

corStruct

if method = "GLS", specifies the correlation structure to be used (this also gives the assumed model).

ip

the initial value(s) used for the ML estimation procedure when type == "discrete" (possibly recycled).

use.expm

a logical specifying whether to use the package expm to compute the matrix exponential (relevant only if type = "d"). If FALSE, the function matexpo from ape is used (see details). This option is ignored if use.eigen = TRUE (see next).

use.eigen

a logical (relevant if type = "d"); if TRUE then the probability matrix is computed with an eigen decomposition instead of a matrix exponential (see details).

marginal

a logical (relevant if type = "d"). By default, the joint reconstruction of the ancestral states are done. Set this option to TRUE if you want the marginal reconstruction (see details.)

digits

the number of digits to be printed.

object

an object of class "ace".

k

a numeric value giving the penalty per estimated parameter; the default is k = 2 which is the classical Akaike information criterion.

...

further arguments passed to or from other methods.

Details

If type = "continuous", the default model is Brownian motion where characters evolve randomly following a random walk. This model can be fitted by residual maximum likelihood (the default), maximum likelihood (Felsenstein 1973, Schluter et al. 1997), least squares (method = "pic", Felsenstein 1985), or generalized least squares (method = "GLS", Martins and Hansen 1997, Cunningham et al. 1998). In the last case, the specification of phy and model are actually ignored: it is instead given through a correlation structure with the option corStruct.

In the setting method = "ML" and model = "BM" (this used to be the default until ape 3.0-7) the maximum likelihood estimation is done simultaneously on the ancestral values and the variance of the Brownian motion process; these estimates are then used to compute the confidence intervals in the standard way. The REML method first estimates the ancestral value at the root (aka, the phylogenetic mean), then the variance of the Brownian motion process is estimated by optimizing the residual log-likelihood. The ancestral values are finally inferred from the likelihood function giving these two parameters. If method = "pic" or "GLS", the confidence intervals are computed using the expected variances under the model, so they depend only on the tree.

It could be shown that, with a continous character, REML results in unbiased estimates of the variance of the Brownian motion process while ML gives a downward bias. Therefore the former is recommanded.

For discrete characters (type = "discrete"), only maximum likelihood estimation is available (Pagel 1994) (see MPR for an alternative method). The model is specified through a numeric matrix with integer values taken as indices of the parameters. The numbers of rows and of columns of this matrix must be equal, and are taken to give the number of states of the character. For instance, matrix(c(0, 1, 1, 0), 2) will represent a model with two character states and equal rates of transition, matrix(c(0, 1, 2, 0), 2) a model with unequal rates, matrix(c(0, 1, 1, 1, 0, 1, 1, 1, 0), 3) a model with three states and equal rates of transition (the diagonal is always ignored). There are short-cuts to specify these models: "ER" is an equal-rates model (e.g., the first and third examples above), "ARD" is an all-rates-different model (the second example), and "SYM" is a symmetrical model (e.g., matrix(c(0, 1, 2, 1, 0, 3, 2, 3, 0), 3)). If a short-cut is used, the number of states is determined from the data.

By default, the likelihood of the different ancestral states of discrete characters are computed with a joint estimation procedure using a procedure similar to the one described in Pupko et al. (2000). If marginal = TRUE, a marginal estimation procedure is used (this was the only choice until ape 3.1-1). With this method, the likelihood values at a given node are computed using only the information from the tips (and branches) descending from this node. With the joint estimation, all information is used for each node. The difference between these two methods is further explained in Felsenstein (2004, pp. 259-260) and in Yang (2006, pp. 121-126). The present implementation of the joint estimation uses a “two-pass” algorithm which is much faster than stochastic mapping while the estimates of both methods are very close.

With discrete characters it is necessary to compute the exponential of the rate matrix. The only possibility until ape 3.0-7 was the function matexpo in ape. If use.expm = TRUE and use.eigen = FALSE, the function expm, in the package of the same name, is used. matexpo is faster but quite inaccurate for large and/or asymmetric matrices. In case of doubt, use the latter. Since ape 3.0-10, it is possible to use an eigen decomposition avoiding the need to compute the matrix exponential; see details in Lebl (2013, sect. 3.8.3). This is much faster and is now the default.

Since version 5.2 of ape, ace can take state uncertainty for discrete characters into account: this should be coded with R's NA only. More details:

https://www.mail-archive.com/[email protected]/msg05286.html

Value

an object of class "ace" with the following elements:

ace

if type = "continuous", the estimates of the ancestral character values.

CI95

if type = "continuous", the estimated 95% confidence intervals.

sigma2

if type = "continuous", model = "BM", and method = "ML", the maximum likelihood estimate of the Brownian parameter.

rates

if type = "discrete", the maximum likelihood estimates of the transition rates.

se

if type = "discrete", the standard-errors of estimated rates.

index.matrix

if type = "discrete", gives the indices of the rates in the rate matrix.

loglik

if method = "ML", the maximum log-likelihood.

lik.anc

if type = "discrete", the scaled likelihoods of each ancestral state.

call

the function call.

Note

Liam Revell points out that for discrete characters the ancestral likelihood values returned with marginal = FALSE are actually the marginal estimates, while setting marginal = TRUE returns the conditional (scaled) likelihoods of the subtree:

http://blog.phytools.org/2015/05/about-how-acemarginaltrue-does-not.html

Author(s)

Emmanuel Paradis, Ben Bolker

References

Cunningham, C. W., Omland, K. E. and Oakley, T. H. (1998) Reconstructing ancestral character states: a critical reappraisal. Trends in Ecology & Evolution, 13, 361–366.

Felsenstein, J. (1973) Maximum likelihood estimation of evolutionary trees from continuous characters. American Journal of Human Genetics, 25, 471–492.

Felsenstein, J. (1985) Phylogenies and the comparative method. American Naturalist, 125, 1–15.

Felsenstein, J. (2004) Inferring Phylogenies. Sunderland: Sinauer Associates.

Lebl, J. (2013) Notes on Diffy Qs: Differential Equations for Engineers. http://www.jirka.org/diffyqs/.

Martins, E. P. and Hansen, T. F. (1997) Phylogenies and the comparative method: a general approach to incorporating phylogenetic information into the analysis of interspecific data. American Naturalist, 149, 646–667.

Pagel, M. (1994) Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters. Proceedings of the Royal Society of London. Series B. Biological Sciences, 255, 37–45.

Pupko, T., Pe'er, I, Shamir, R., and Graur, D. (2000) A fast algorithm for joint reconstruction of ancestral amino acid sequences. Molecular Biology and Evolution, 17, 890–896.

Schluter, D., Price, T., Mooers, A. O. and Ludwig, D. (1997) Likelihood of ancestor states in adaptive radiation. Evolution, 51, 1699–1711.

Yang, Z. (2006) Computational Molecular Evolution. Oxford: Oxford University Press.

See Also

MPR, corBrownian, compar.ou, anova

Reconstruction of ancestral sequences can be done with the package phangorn (see function ?ancestral.pml).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
### Some random data...
data(bird.orders)
x <- rnorm(23)
### Compare the three methods for continuous characters:
ace(x, bird.orders)
ace(x, bird.orders, method = "pic")
ace(x, bird.orders, method = "GLS",
    corStruct = corBrownian(1, bird.orders))
### For discrete characters:
x <- factor(c(rep(0, 5), rep(1, 18)))
ans <- ace(x, bird.orders, type = "d")
#### Showing the likelihoods on each node:
plot(bird.orders, type = "c", FALSE, label.offset = 1)
co <- c("blue", "yellow")
tiplabels(pch = 22, bg = co[as.numeric(x)], cex = 2, adj = 1)
nodelabels(thermo = ans$lik.anc, piecol = co, cex = 0.75)

Example output

    Ancestral Character Estimation

Call: ace(x = x, phy = bird.orders)

    Residual log-likelihood: 57.98664 

$ace
         24          25          26          27          28          29 
-0.29509908 -0.36167835 -0.59410736 -0.28672290 -0.32983186 -0.26339385 
         30          31          32          33          34          35 
-0.24818008 -0.22303711 -0.21047798 -0.15113842 -0.03759480 -0.13076733 
         36          37          38          39          40          41 
-0.22383141 -0.24550706 -0.25921611 -0.20771191 -0.17649159 -0.19287315 
         42          43          44          45 
-0.05748375 -0.17728473 -0.21090687 -0.21943288 

$sigma2
[1] 0.04249020 0.01258875

$CI95
         [,1]      [,2]
24 -1.0257313 0.4355331
25 -1.1289796 0.4056229
26 -1.5189612 0.3307465
27 -1.1276830 0.5542372
28 -1.2104351 0.5507714
29 -0.9443657 0.4175780
30 -0.8967247 0.4003646
31 -0.8019389 0.3558647
32 -0.8211081 0.4001522
33 -0.8113954 0.5091185
34 -0.8494552 0.7742656
35 -0.8801392 0.6186045
36 -0.8048211 0.3571583
37 -0.8268424 0.3358283
38 -0.8378037 0.3193714
39 -0.7831588 0.3677350
40 -0.7853027 0.4323195
41 -0.8531535 0.4674072
42 -0.7790674 0.6640999
43 -0.8019430 0.4473735
44 -0.8790579 0.4572442
45 -0.9334170 0.4945513


    Ancestral Character Estimation

Call: ace(x = x, phy = bird.orders, method = "pic")

$ace
         24          25          26          27          28          29 
-0.29507447 -0.56199321 -1.21202277 -0.08890788 -0.68795732 -0.15324271 
         30          31          32          33          34          35 
-0.17673248 -0.17552412 -0.09243456  0.22469175  0.41657962  0.04239828 
         36          37          38          39          40          41 
-0.22944339 -0.33623174 -0.33460632  0.07137587  0.12195813 -0.48363366 
         42          43          44          45 
 0.75177359  0.01806120 -0.50874908 -0.34183375 

$CI95
         [,1]      [,2]
24  -7.332313  6.742164
25 -10.539896  9.415909
26 -14.153723 11.729677
27 -11.684211 11.506395
28 -13.570155 12.194241
29 -11.063398 10.756912
30 -10.921769 10.568304
31  -6.461818  6.110769
32 -11.133372 10.948503
33  -9.643488 10.092872
34 -12.224808 13.057967
35 -12.988046 13.072843
36 -10.720182 10.261296
37 -10.633171  9.960707
38 -10.509624  9.840411
39  -6.998857  7.141609
40  -9.267476  9.511393
41 -13.276059 12.308792
42 -11.767472 13.271020
43 -10.624952 10.661074
44 -11.517747 10.500248
45 -12.768686 12.085018


    Ancestral Character Estimation

Call: ace(x = x, phy = bird.orders, method = "GLS", corStruct = corBrownian(1, 
    bird.orders))

$ace
         24          25          26          27          28          29 
-0.29507447 -0.36165837 -0.59409131 -0.28670534 -0.32981506 -0.26336785 
         30          31          32          33          34          35 
-0.24815345 -0.22301013 -0.21045208 -0.15111429 -0.03757551 -0.13074454 
         36          37          38          39          40          41 
-0.22380467 -0.24548033 -0.25918956 -0.20768563 -0.17646589 -0.19284817 
         42          43          44          45 
-0.05746083 -0.17725935 -0.21088256 -0.21940962 

$CI95
         [,1]       [,2]
24 -0.2950745 -0.2950745
25 -2.8222891  2.0989724
26 -4.4205961  3.2324135
27 -3.6840056  3.1105949
28 -4.0247050  3.3650749
29 -1.9904166  1.4636809
30 -2.3082184  1.8119115
31 -2.4767201  2.0306998
32 -2.7069321  2.2860279
33 -2.9735563  2.6713278
34 -3.6553910  3.5802400
35 -3.4240073  3.1625183
36 -2.5871896  2.1395803
37 -2.7193643  2.2284036
38 -2.7756309  2.2572518
39 -2.7490815  2.3337103
40 -2.8930514  2.5401197
41 -3.1585849  2.7728886
42 -3.3181841  3.2032625
43 -2.9748194  2.6203007
44 -3.2242948  2.8025297
45 -3.4521301  3.0133109

Warning messages:
1: In Initialize.corPhyl(corStruct, data.frame(x)) :
  Rownames in data frame do not match tree tip names; data taken to be in the same order as in tree
2: In Initialize.corPhyl(X[[i]], ...) :
  Rownames in data frame do not match tree tip names; data taken to be in the same order as in tree

ape documentation built on Sept. 24, 2018, 9:03 a.m.