phyloglm: Phylogenetic Generalized Linear Model

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/phyloglm.R

Description

Fits the phylogenetic logistic regression described in Ives and Garland (2010) and the Poisson regression described in Paradis and Claude (2002). The computation uses an algorithm that is linear in the number of tips in the tree.

Usage

1
2
3
4
phyloglm(formula, data, phy, method = c("logistic_MPLE","logistic_IG10",
         "poisson_GEE"), btol = 10, log.alpha.bound = 4,
         start.beta=NULL, start.alpha=NULL,
         boot = 0, full.matrix = TRUE)

Arguments

formula

a model formula.

data

a data frame containing variables in the model. If not found in data, the variables are taken from the current environment.

phy

a phylogenetic tree of type phylo with branch lengths.

method

The "logistic_IG10" method optimizes a GEE approximation to the penalized likelihood of the logistic regression. "logistic_MPLE" maximizes the penalized likelihood of the logistic regression. In both cases, the penalty is Firth's correction. The "poisson_GEE" method solves the generalized estimating equations (GEE) for Poisson regression.

btol

(logistic regression only) bound on the linear predictor to bound the searching space.

log.alpha.bound

(logistic regression only) bound for the log of the parameter alpha.

start.beta

starting values for beta coefficients.

start.alpha

(logistic regression only) starting values for alpha (phylogenetic correlation).

boot

number of independent bootstrap replicates, 0 means no bootstrap.

full.matrix

if TRUE, the full matrix of bootstrap estimates (coefficients and alpha) will be returned.

Details

This function uses an algorithm that is linear in the number of tips in the tree.

Bootstrapping can be parallelized using the future package on any future compatible back-end. For example, run library(future); plan(multiprocess)), after which bootstrapping will automatically occur in parallel. See plan for options.

Value

coefficients

the named vector of coefficients.

alpha

(logistic regression only) the phylogenetic correlation parameter.

scale

(Poisson regression only) the scale parameter which estimates the overdispersion.

sd

standard deviation for the regression coefficients.

vcov

covariance matrix for the regression coefficients.

logLik

(logistic regression only) log likelihood.

aic

(logistic regression only) AIC.

penlogLik

(logistic regression only) penalized log likelihood, using Firth's penalty for coefficients.

y

response.

n

number of observations (tips in the tree).

d

number of dependent variables.

formula

the model formula.

call

the original call to the function.

method

the estimation method.

convergence

An integer code with '0' for successful optimization. With logistic_MPLE, this is the convergence code from the optim routine.

alphaWarn

(logistic regression only) An interger code with '0' for the estimate of alpha is not near the lower and upper bounds, code with '1' for the estimate of alpha near the lower bound, code with '2' for the estimate of alpha near the upper bound.

X

a design matrix with one row for each tip in the phylogenetic tree.

bootmean

(boot > 0 only) bootstrap means of the parameters estimated.

bootsd

(boot > 0 only) bootstrap standard deviations of the estimated parameters.

bootconfint95

(boot > 0 only) bootstrap 95% confidence interval.

bootmeanAlog

(boot > 0 only) bootstrap mean of the logs of the estimated alphas.

bootsdAlog

(boot > 0 only) bootstrap standard deviation of the logs of the estimated alphas.

bootnumFailed

(boot > 0 only) number of independent bootstrap replicates for which phyloglm failed. These failures may be due to the bootstrap data having too few 0's or too few 1's.

bootstrap

(boot > 0 and full.matrix = TRUE only) matrix of all bootstrap estimates.

Note

The tip labels in the tree are matched to the data names (row names in the data frame). If no data frame is provided through the argument data, taxon labels in the tree are matched to taxon labels in the response variable based on the row names of the response vector, and the taxa are assumed to come in the same order for all variables in the model.

The logistic regression method of Ives and Garland (2010) uses alpha to estimate the level of phylogenetic correlation. The GEE method for Poisson regression does not estimate the level of phylogenetic correlation but takes it from the existing branch lengths in the tree.

The standard deviation and the covariance matrix for the coefficients of logistic regression are conditional on the estimated value of the phylogenetic correlation parameter alpha.

The default choice btol=10 constrains the fitted values, i.e. the probability of "1" predicted by the model, to lie within 1/(1+exp(+10)) = 0.000045 and 1/(1+exp(-10)) = 0.999955.

The log of alpha is bounded in the interval (-log(T) - log.alpha.bound, -log(T) + log.alpha.bound) where T is the mean of the distances from the root to tips. In other words, alpha*T is constrained to lie within exp(+/- log.alpha.bound).

Author(s)

Lam Si Tung Ho, Robert Lachlan, Rachel Feldman and Cecile Ane

References

Ho, L. S. T. and Ane, C. 2014. "A linear-time algorithm for Gaussian and non-Gaussian trait evolution models". Systematic Biology 63(3):397-408.

Ives, A. R. and T. Garland, Jr. 2010. "Phylogenetic logistic regression for binary dependent variables". Systematic Biology 59:9-26.

Paradis E. and Claude J. 2002. "Analysis of Comparative Data Using Generalized Estimating Equations". Journal of Theoretical Biology 218:175-185.

See Also

compar.gee.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
set.seed(123456)
tre = rtree(50)
x = rTrait(n=1,phy=tre)
X = cbind(rep(1,50),x)
y = rbinTrait(n=1,phy=tre, beta=c(-1,0.5), alpha=1 ,X=X)
dat = data.frame(trait01 = y, predictor = x)
fit = phyloglm(trait01~predictor,phy=tre,data=dat,boot=100)
summary(fit)
coef(fit)
vcov(fit)

Example output

Loading required package: ape

Call:
phyloglm(formula = trait01 ~ predictor, data = dat, phy = tre, 
    boot = 100)
       AIC     logLik Pen.logLik 
     48.54     -21.27     -19.61 

Method: logistic_MPLE
Mean tip height: 3.596271
Parameter estimate(s):
alpha: 1.241056 
      bootstrap mean: 2.09384 (on log scale, then back transformed)
      so possible upward bias.
      bootstrap 95% CI: (0.1415566,15.06819)

Coefficients:
            Estimate   StdErr  z.value lowerbootCI upperbootCI  p.value   
(Intercept) -0.50567  0.41742 -1.21140    -1.50615      0.4158 0.225740   
predictor    1.41266  0.45578  3.09946     0.56413      2.3718 0.001939 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note: Wald-type p-values for coefficients, conditional on alpha=1.241056
      Parametric bootstrap results based on 100 fitted replicates

(Intercept)   predictor 
 -0.5056675   1.4126605 
             (Intercept)    predictor
(Intercept)  0.174241457 -0.002343877
predictor   -0.002343877  0.207731666

phylolm documentation built on July 2, 2020, 3:44 a.m.