Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/pglmm_compare.R
pglmm_compare
performs linear regression for Gaussian, binomial and Poisson
phylogenetic data, estimating regression coefficients with approximate standard
errors. It simultaneously estimates the strength of phylogenetic signal in the
residuals and gives an approximate conditional likelihood ratio test for the
hypothesis that there is no signal. Therefore, when applied without
predictor (independent) variables, it gives a test for phylogenetic signal.
pglmm_compare
is a wrapper for pglmm
tailored for comparative data in
which each value of the response (dependent) variable corresponds to a single tip
on a phylogenetic tree. If there are multiple measures for each species, pglmm
will be helpful.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | pglmm_compare(
formula,
family = "gaussian",
data = list(),
phy,
REML = TRUE,
optimizer = c("nelder-mead-nlopt", "bobyqa", "Nelder-Mead", "subplex"),
add.obs.re = TRUE,
verbose = FALSE,
cpp = TRUE,
bayes = FALSE,
reltol = 10^-6,
maxit = 500,
tol.pql = 10^-6,
maxit.pql = 200,
marginal.summ = "mean",
calc.DIC = FALSE,
prior = "inla.default",
prior_alpha = 0.1,
prior_mu = 1,
ML.init = FALSE,
s2.init = 1,
B.init = NULL
)
|
formula |
A two-sided linear formula object describing the
fixed-effects of the model; for example, Y ~ X. Binomial data can be either
presence/absence, or a two-column array of 'successes' and 'failures'.
For both binomial and Poisson data, we add an observation-level
random term by default via |
family |
Either "gaussian" for a Linear Mixed Model, or
"binomial" or "poisson" for Generalized Linear Mixed Models.
|
data |
A data frame containing the variables named in formula. It must has the tip labels of the phylogeny as row names; if they are not in the same order, the data frame will be arranged so that row names match the order of tip labels. |
phy |
A phylogenetic tree as an object of class "phylo". |
REML |
Whether REML or ML is used for model fitting the random effects. Ignored if
|
optimizer |
nelder-mead-nlopt (default), bobyqa, Nelder-Mead, or subplex.
Nelder-Mead is from the stats package and the other optimizers are from the nloptr package.
Ignored if |
add.obs.re |
Whether to add observation-level random term for binomial and Poisson
families. Normally it would be a good idea to add this to account for overdispersion,
so |
verbose |
If |
cpp |
Whether to use C++ function for optim. Default is TRUE. Ignored if |
bayes |
Whether to fit a Bayesian version of the PGLMM using |
reltol |
A control parameter dictating the relative tolerance
for convergence in the optimization; see |
maxit |
A control parameter dictating the maximum number of
iterations in the optimization; see |
tol.pql |
A control parameter dictating the tolerance for
convergence in the PQL estimates of the mean components of the
GLMM. Ignored if |
maxit.pql |
A control parameter dictating the maximum number
of iterations in the PQL estimates of the mean components of the
GLMM. Ignored if |
marginal.summ |
Summary statistic to use for the estimate of coefficients when
doing a Bayesian PGLMM (when |
calc.DIC |
Should the Deviance Information Criterion be calculated and returned,
when doing a Bayesian PGLMM? Ignored if |
prior |
Which type of default prior should be used by |
prior_alpha |
Only used if |
prior_mu |
Only used if |
ML.init |
Only relevant if |
s2.init |
An array of initial estimates of s2. If s2.init is not provided for
|
B.init |
Initial estimates of B, a matrix containing
regression coefficients in the model for the fixed effects. This
matrix must have |
pglmm_compare
in the package phyr
is similar to binaryPGLMM
in
the package ape
, although it has much broader functionality, including
accepting more than just binary data, implementing Bayesian analyses, etc.
For non-Gaussian data, the function estimates parameters for the model
Pr(Y = 1) = θ
θ = inverse.link(b0 + b1 * x1 + b2 * x2 + … + ε)
ε ~ Gaussian(0, s2 * V)
where V is a covariance matrix derived from a phylogeny
(typically under the assumption of Brownian motion evolution). Although
mathematically there is no requirement for V to be ultrametric,
forcing V into ultrametric form can aide in the interpretation of the
model. This is especially true for binary data, because in regression for
binary dependent variables, only the off-diagonal elements (i.e., covariances)
of matrix V are biologically meaningful (see Ives & Garland 2014).
The function converts a phylo tree object into a covariance matrix,
and further standardizes this matrix to have determinant = 1. This in effect
standardizes the interpretation of the scalar s2
. Although mathematically
not required, it is a very good idea to standardize the predictor
(independent) variables to have mean 0 and variance 1. This will make the
function more robust and improve the interpretation of the regression
coefficients.
For Gaussian data, the function estimates parameters for the model
Y = b0 + b1 * x1 + b2 * x2 + … + ε)
ε ~ Gaussian(0, s2 * V + s2resid * I)
where s2resid * I gives the non-phylogenetic residual variance. Note that this is equivalent to a model with Pagel's lambda transformation.
An object (list) of class pglmm_compare
with the following elements:
formula |
the formula for fixed effects |
formula_original |
the formula for both fixed effects and random effects |
data |
the dataset |
family |
either |
B |
estimates of the regression coefficients |
B.se |
approximate standard errors of the fixed effects regression coefficients.
This is set to NULL if |
B.ci |
approximate bayesian credible interval of the fixed effects regression coefficients.
This is set to NULL if |
B.cov |
approximate covariance matrix for the fixed effects regression coefficients |
B.zscore |
approximate Z scores for the fixed effects regression coefficients. This is set to NULL if |
B.pvalue |
approximate tests for the fixed effects regression coefficients being different from zero. This is set to NULL if |
ss |
random effects' standard deviations for the covariance matrix sigma^2 V for each random effect in order. For the linear mixed model, the residual variance is listed last |
s2r |
random effects variances for non-nested random effects |
s2n |
random effects variances for nested random effects |
s2resid |
for linear mixed models, the residual variance |
s2r.ci |
Bayesian credible interval for random effects variances for non-nested random effects.
This is set to NULL if |
s2n.ci |
Bayesian credible interval for random effects variances for nested random effects.
This is set to NULL if |
s2resid.ci |
Bayesian credible interval for linear mixed models, the residual variance.
This is set to NULL if |
logLik |
for linear mixed models, the log-likelihood for either the restricted likelihood ( |
AIC |
for linear mixed models, the AIC for either the restricted likelihood ( |
BIC |
for linear mixed models, the BIC for either the restricted likelihood ( |
DIC |
for bayesian PGLMM, this is the Deviance Information Criterion metric of model fit. This is set to NULL if |
REML |
whether or not REML is used ( |
bayes |
whether or not a Bayesian model was fit. |
marginal.summ |
The specified summary statistic used to summarise the Bayesian marginal distributions.
Only present if |
s2.init |
the user-provided initial estimates of |
B.init |
the user-provided initial estimates of |
Y |
the response (dependent) variable returned in matrix form |
X |
the predictor (independent) variables returned in matrix form (including 1s in the first column) |
H |
the residuals. For linear mixed models, this does not account for random terms,
To get residuals after accounting for both fixed and random terms, use |
iV |
the inverse of the covariance matrix. This is NULL if |
mu |
predicted mean values for the generalized linear mixed model (i.e. similar to |
Zt |
the design matrix for random effects. This is set to NULL if |
St |
diagonal matrix that maps the random effects variances onto the design matrix |
convcode |
the convergence code provided by |
niter |
number of iterations performed by |
inla.model |
Model object fit by underlying |
Anthony R. Ives
Ives, A. R. and Helmus, M. R. (2011) Generalized linear mixed models for phylogenetic analyses of community structure. Ecological Monographs, 81, 511–525.
Ives, A. R. and Garland, T., Jr. (2014) Phylogenetic regression for binary dependent variables. Pages 231–261 in L. Z. Garamszegi, editor. Modern Phylogenetic Comparative Methods and Their Application in Evolutionary Biology. Springer-Verlag, Berlin Heidelberg.
pglmm
; package ape and its function binaryPGLMM
;
package phylolm and its function phyloglm
; package MCMCglmm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | ## Illustration of `pglmm_compare` with simulated data
# Generate random phylogeny
library(ape)
n <- 100
phy <- compute.brlen(rtree(n=n), method = "Grafen", power = 1)
# Generate random data and standardize to have mean 0 and variance 1
X1 <- rTraitCont(phy, model = "BM", sigma = 1)
X1 <- (X1 - mean(X1))/var(X1)
# Simulate binary Y
sim.dat <- data.frame(Y = array(0, dim = n), X1 = X1, row.names = phy$tip.label)
sim.dat$Y <- ape::binaryPGLMM.sim(Y ~ X1, phy = phy, data=sim.dat, s2 = 1,
B = matrix(c(0, .25), nrow = 2, ncol = 1),
nrep = 1)$Y
# Fit model
pglmm_compare(Y ~ X1, family = "binomial", phy = phy, data = sim.dat)
# Compare with `binaryPGLMM`
ape::binaryPGLMM(Y ~ X1, phy = phy, data = sim.dat)
# Compare with `phyloglm`
summary(phylolm::phyloglm(Y ~ X1, phy = phy, data = sim.dat))
# Compare with `glm` that does not account for phylogeny
summary(glm(Y ~ X1, data = sim.dat, family = "binomial"))
# Compare with logistf() that does not account
# for phylogeny but is less biased than glm()
logistf::logistf(Y ~ X1, data = sim.dat)
## Fit model with bayes = TRUE
# pglmm_compare(Y ~ X1, family = "binomial", phy = phy, data = sim.dat,
# bayes = TRUE, calc.DIC = TRUE)
# Compare with `MCMCglmm`
V <- vcv(phy)
V <- V/max(V)
detV <- exp(determinant(V)$modulus[1])
V <- V/detV^(1/n)
invV <- Matrix::Matrix(solve(V),sparse = TRUE)
sim.dat$species <- phy$tip.label
rownames(invV) <- sim.dat$species
nitt <- 43000
thin <- 10
burnin <- 3000
prior <- list(R=list(V=1, fix=1), G=list(G1=list(V=1, nu=1000, alpha.mu=0, alpha.V=1)))
# commented out to save time
# summary(MCMCglmm::MCMCglmm(Y ~ X1, random = ~species, ginvers = list(species = invV),
# data = sim.dat, slice = TRUE, nitt = nitt, thin = thin, burnin = burnin,
# family = "categorical", prior = prior, verbose = FALSE))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.