Phylogenetic Generalised Linear Mixed Model for Community Data
Description
This function performs Generalized Linear Mixed Models for binary
and continuous phylogenetic data, estimating regression
coefficients with approximate standard errors. It is modeled after
lmer
but is more general by allowing
correlation structure within random effects; these correlations can
be phylogenetic among species, or any other correlation structure,
such as geographical correlations among sites. It is, however, much
more specific than lmer
in that it can
only analyze a subset of1 the types of model designed handled by
lmer
. It is also much slower than
lmer
and requires users to specify
correlation structures as covariance
matrices. communityPGLMM
can analyze models in Ives and
Helmus (2011). It can also analyze bipartite phylogenetic data,
such as that analyzed in Rafferty and Ives (2011), by giving sites
phylogenetic correlations.
communityPGLMM.binary
calculates the statistical
significance of the random effects in the generalized linear mixed
model from the marginal profile likelihood.
communityPGLMM.binary.LRT
tests statistical significance of
the phylogenetic random effect on species slopes using a likelihood
ratio test
communityPGLMM.matrix.structure
produces the entire
covariance matrix structure (V) when you specify random effects.
communityPGLMM.predicted.values
calculates the predicted
values of Y; for the generalized linear mixed model (family =
"binomial"), these values are in the logit1 transformed space.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29  communityPGLMM(formula, data = list(), family = "gaussian", sp = NULL,
site = NULL, random.effects = list(), REML = TRUE, s2.init = NULL,
B.init = NULL, reltol = 10^6, maxit = 500, tol.pql = 10^6,
maxit.pql = 200, verbose = FALSE)
communityPGLMM.gaussian(formula, data = list(), family = "gaussian",
sp = NULL, site = NULL, random.effects = list(), REML = TRUE,
s2.init = NULL, B.init = NULL, reltol = 10^8, maxit = 500,
verbose = FALSE)
communityPGLMM.binary(formula, data = list(), family = "binomial",
sp = NULL, site = NULL, random.effects = list(), REML = TRUE,
s2.init = 0.25, B.init = NULL, reltol = 10^5, maxit = 40,
tol.pql = 10^6, maxit.pql = 200, verbose = FALSE)
communityPGLMM.binary.LRT(x, re.number = 0, ...)
communityPGLMM.matrix.structure(formula, data = list(), family = "binomial",
sp = NULL, site = NULL, random.effects = list(), ss = 1)
## S3 method for class 'communityPGLMM'
summary(object, digits = max(3, getOption("digits") 
3), ...)
## S3 method for class 'communityPGLMM'
plot(x, digits = max(3, getOption("digits")  3),
...)
communityPGLMM.predicted.values(x, show.plot = TRUE, ...)

Arguments
formula 
a twosided linear formula object describing the
fixedeffects of the model; for example, 
data 
a 
family 
either 
sp 
a 
site 
a 
random.effects 
a 
REML 
whether REML or ML is used for model fitting. For the generalized linear mixed model for binary data, these don't have standard interpretations, and there is no log likelihood function that can be used in likelihood ratio tests. 
s2.init 
an array of initial estimates of s2 for each random
effect that scales the variance. If s2.init is not provided for

B.init 
initial estimates of B, a matrix containing
regression coefficients in the model for the fixed effects. This
matrix must have 
reltol 
a control parameter dictating the relative tolerance
for convergence in the optimization; see 
maxit 
a control parameter dictating the maximum number of
iterations in the optimization; see 
tol.pql 
a control parameter dictating the tolerance for convergence in the PQL estimates of the mean components of the binomial GLMM. 
maxit.pql 
a control parameter dictating the maximum number of iterations in the PQL estimates of the mean components of the binomial GLMM. 
verbose 
if 
x 

re.number 
which 
... 
additional arguments to summary and plotting functions (currently ignored) 
ss 
which of the 
object 
communityPGLMM object to be summarised 
digits 
minimal number of significant digits for printing, as
in 
show.plot 
if 
Details
The vignette 'pezpglmmoverview' gives a gentle
introduction to using PGLMMS. For linear mixed models (family
= 'gaussian'
), the function estimates parameters for the model of
the form, for example,
y = beta_0 + beta_1x + b_0 + b_1x
b_0 ~ Gaussian(0, sigma_0^2I_(sp))
b_0 ~ Gaussian(0, sigma_0^2V_(sp))
e ~ Gaussian(0,sigma^2)
where beta_0 and beta_1 are fixed effects, and V_(sp) is a variancecovariance matrix derived from a phylogeny (typically under the assumption of Brownian motion evolution). Here, the variation in the mean (intercept) for each species is given by the random effect b_0 that is assumed to be independent among species. Variation in species' responses to predictor variable x is given by a random effect b_0 that is assumed to depend on the phylogenetic relatedness among species given by V_(sp_; if species are closely related, their specific responses to x will be similar. This particular model would be specified as
re.1 < list(1, sp = dat$sp, covar = diag(nspp))
re.2 < list(dat$X, sp = dat$sp, covar = Vsp)
z < communityPGLMM(Y ~ X, data = data, family = "gaussian", random.effects = list(re.1, re.2))
The covariance matrix covar is standardized to have its determinant equal to 1. This in effect standardizes the interpretation of the scalar sigma^2. Although mathematically this is not required, it is a very good idea to standardize the predictor (independent) variables to have mean 0 and variance 1. This will make the function more robust and improve the interpretation of the regression coefficients. For categorical (factor) predictor variables, you will need to construct 01 dummy variables, and these should not be standardized (for obvious reasons).
For binary generalized linear mixed models (family =
'binomial'
), the function estimates parameters for the model of
the form, for example,
y = beta_0 + beta_1x + b_0 + b_1x
Y = logit^(1)(y)
b_0 ~ Gaussian(0, sigma_0^2I_(sp))
b_0 ~ Gaussian(0, sigma_0^2V_(sp))
where beta_0 and beta_1 are fixed effects, and V_(sp) is a variancecovariance matrix derived from a phylogeny (typically under the assumption of Brownian motion evolution).
z < communityPGLMM(Y ~ X, data = data, family =
'binomial', random.effects = list(re.1, re.2))
As with the linear mixed model, it is a very good idea to standardize the predictor (independent) variables to have mean 0 and variance 1. This will make the function more robust and improve the interpretation of the regression coefficients. For categorical (factor) predictor variables, you will need to construct 01 dummy variables, and these should not be standardized (for obvious reasons).
Value
an object of class communityPGLMM
formula 
the formula for fixed effects 
data 
the dataset 
family 
either 
random.effects 
the list of random effects 
B 
estimates of the regression coefficients 
B.se 
approximate standard errors of the fixed effects regression coefficients 
B.cov 
approximate covariance matrix for the fixed effects regression coefficients 
B.zscore 
approximate Z scores for the fixed effects regression coefficients 
B.pvalue 
approximate tests for the fixed effects regression coefficients being different from zero 
ss 
random effects' standard deviations for the covariance matrix sigma^2 V for each random effect in order. For the linear mixed model, the residual variance is listed last 
s2r 
random effects variances for nonnested random effects 
s2n 
random effects variances for nested random effects 
s2resid 
for linear mixed models, the residual vairance 
logLIK 
for linear mixed models, the loglikelihood for either the restricted likelihood ( 
AIC 
for linear mixed models, the AIC for either the restricted likelihood ( 
BIC 
for linear mixed models, the BIC for either the restricted likelihood ( 
REML 
whether or not REML is used ( 
s2.init 
the userprovided initial estimates of 
B.init 
the userprovided initial estimates of 
Y 
the response (dependent) variable returned in matrix form 
X 
the predictor (independent) variables returned in matrix form (including 1s in the first column) 
H 
the residuals. For the generalized linear mixed model, these are the predicted residuals in the logit 1 space 
iV 
the inverse of the covariance matrix for the entire system (of dimension (nsp*nsite) by (nsp*nsite)) 
mu 
predicted mean values for the generalized linear mixed model. Set to NULL for linear mixed models 
sp, sp 
matrices used to construct the nested design matrix 
Zt 
the design matrix for random effects 
St 
diagonal matrix that maps the random effects variances onto the design matrix 
convcode 
the convergence code provided by 
niter 
number of iterations performed by 
Note
These function do not use a
comparative.comm
object, but you can use
as.data.frame.comparative.comm
to
create a data.frame
for use with these functions. The power
of this method comes from deciding your own parameters parameters
to be determined (the data for regression, the random effects,
etc.), and it is our hope that this interface gives you more
flexibility in model selection/fitting.
Author(s)
Anthony R. Ives, cosmetic changes by Will Pearse
References
Ives, A. R. and M. R. Helmus. 2011. Generalized linear mixed models for phylogenetic analyses of community structure. Ecological Monographs 81:511525.
Rafferty, N. E., and A. R. Ives. 2013. Phylogenetic traitbased analyses of ecological networks. Ecology 94:23212333.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232  ## Structure of examples:
# First, a (brief) description of model types, and how they are specified
#  these are *not* to be run 'asis'; they show how models should be organised
# Second, a runthrough of how to simulate, and then analyse, data
#  these *are* to be run 'asis'; they show how to format and work with data
## Not run:
#########################################################
#First section; brief summary of models and their use####
#########################################################
## Model structures from Ives & Helmus (2011)
# dat = data set for regression (note: *not* an comparative.comm object)
# nspp = number of species
# nsite = number of sites
# Vphy = phylogenetic covariance matrix for species
# Vrepul = inverse of Vphy representing phylogenetic repulsion
# Model 1 (Eq. 1)
re.site < list(1, site = dat$site, covar = diag(nsite))
re.sp.site < list(1, sp = dat$sp, covar = Vphy, site = dat$site) # note: nested
z < communityPGLMM(freq ~ sp, data = dat, family = "binomial", sp
= dat$sp, site = dat$site, random.effects = list(re.site,
re.sp.site), REML = TRUE, verbose = TRUE, s2.init=.1)
# Model 2 (Eq. 2)
re.site < list(1, site = dat$site, covar = diag(nsite))
re.slope < list(X, sp = dat$sp, covar = diag(nspp))
re.slopephy < list(X, sp = dat$sp, covar = Vphy)
z < communityPGLMM(freq ~ sp + X, data = dat, family = "binomial",
sp = dat$sp, site = dat$site, random.effects = list(re.site,
re.slope, re.slopephy), REML = TRUE, verbose = TRUE, s2.init=.1)
# Model 3 (Eq. 3)
re.site < list(1, site = dat$site, covar = diag(nsite))
re.sp.site < list(1, sp = dat$sp, covar = Vrepul, site = dat$site) # note: nested
z < communityPGLMM(freq ~ sp*X, data = dat, family = "binomial",
sp = dat$sp, site = dat$site, random.effects = list(re.site,
re.sp.site), REML = TRUE, verbose = TRUE, s2.init=.1)
## Model structure from Rafferty & Ives (2013) (Eq. 3)
# dat = data set
# npp = number of pollinators (sp)
# nsite = number of plants (site)
# VphyPol = phylogenetic covariance matrix for pollinators
# VphyPlt = phylogenetic covariance matrix for plants
re.a < list(1, sp = dat$sp, covar = diag(nspp))
re.b < list(1, sp = dat$sp, covar = VphyPol)
re.c < list(1, sp = dat$sp, covar = VphyPol, dat$site)
re.d < list(1, site = dat$site, covar = diag(nsite))
re.f < list(1, site = dat$site, covar = VphyPlt)
re.g < list(1, site = dat$site, covar = VphyPlt, dat$sp)
#term h isn't possible in this implementation, but can be done with
available matlab code
z < communityPGLMM(freq ~ sp*X, data = dat, family = "binomial",
sp = dat$sp, site = dat$site, random.effects = list(re.a, re.b,
re.c, re.d, re.f, re.g), REML = TRUE, verbose = TRUE, s2.init=.1)
## End(Not run)
#########################################################
#Second section; detailed simulation and analysis #######
#NOTE: this section is explained and annotated in #######
# detail in the vignette 'pezpglmmoverview'#######
# run 'vignette('pezpglmmoverview') to read#######
#########################################################
# Generate simulated data for nspp species and nsite sites
nspp < 15
nsite < 10
# residual variance (set to zero for binary data)
sd.resid < 0
# fixed effects
beta0 < 0
beta1 < 0
# magnitude of random effects
sd.B0 < 1
sd.B1 < 1
# whether or not to include phylogenetic signal in B0 and B1
signal.B0 < TRUE
signal.B1 < TRUE
# simulate a phylogenetic tree
phy < rtree(n = nspp)
phy < compute.brlen(phy, method = "Grafen", power = 0.5)
# standardize the phylogenetic covariance matrix to have determinant 1
Vphy < vcv(phy)
Vphy < Vphy/(det(Vphy)^(1/nspp))
# Generate environmental site variable
X < matrix(1:nsite, nrow = 1, ncol = nsite)
X < (X  mean(X))/sd(X)
# Perform a Cholesky decomposition of Vphy. This is used to
# generate phylogenetic signal: a vector of independent normal random
# variables, when multiplied by the transpose of the Cholesky
# deposition of Vphy will have covariance matrix equal to Vphy.
iD < t(chol(Vphy))
# Set up speciesspecific regression coefficients as random effects
if (signal.B0 == TRUE) {
b0 < beta0 + iD %*% rnorm(nspp, sd = sd.B0)
} else {
b0 < beta0 + rnorm(nspp, sd = sd.B0)
}
if (signal.B1 == TRUE) {
b1 < beta1 + iD %*% rnorm(nspp, sd = sd.B1)
} else {
b1 < beta1 + rnorm(nspp, sd = sd.B1)
}
# Simulate species abundances among sites to give matrix Y that
# contains species in rows and sites in columns
y < rep(b0, each=nsite)
y < y + rep(b1, each=nsite) * rep(X, nspp)
y < y + rnorm(nspp*nsite) #add some random 'error'
Y < rbinom(length(y), size=1, prob=exp(y)/(1+exp(y)))
y < matrix(outer(b0, array(1, dim = c(1, nsite))), nrow = nspp,
ncol = nsite) + matrix(outer(b1, X), nrow = nspp, ncol = nsite)
e < rnorm(nspp * nsite, sd = sd.resid)
y < y + matrix(e, nrow = nspp, ncol = nsite)
y < matrix(y, nrow = nspp * nsite, ncol = 1)
Y < rbinom(n = length(y), size = 1, prob = exp(y)/(1 + exp(y)))
Y < matrix(Y, nrow = nspp, ncol = nsite)
# name the simulated species 1:nspp and sites 1:nsites
rownames(Y) < 1:nspp
colnames(Y) < 1:nsite
par(mfrow = c(3, 1), las = 1, mar = c(2, 4, 2, 2)  0.1)
matplot(t(X), type = "l", ylab = "X", main = "X among sites")
hist(b0, xlab = "b0", main = "b0 among species")
hist(b1, xlab = "b1", main = "b1 among species")
#Plot out; you get essentially this from plot(your.pglmm.model)
image(t(Y), ylab = "species", xlab = "sites", main = "abundance",
col=c("black","white"))
# Transform data matrices into "long" form, and generate a data frame
YY < matrix(Y, nrow = nspp * nsite, ncol = 1)
XX < matrix(kronecker(X, matrix(1, nrow = nspp, ncol = 1)), nrow =
nspp * nsite, ncol = 1)
site < matrix(kronecker(1:nsite, matrix(1, nrow = nspp, ncol =
1)), nrow = nspp * nsite, ncol = 1)
sp < matrix(kronecker(matrix(1, nrow = nsite, ncol = 1), 1:nspp),
nrow = nspp * nsite, ncol = 1)
dat < data.frame(Y = YY, X = XX, site = as.factor(site), sp = as.factor(sp))
# Format input and perform communityPGLMM()
# set up random effects
# random intercept with species independent
re.1 < list(1, sp = dat$sp, covar = diag(nspp))
# random intercept with species showing phylogenetic covariances
re.2 < list(1, sp = dat$sp, covar = Vphy)
# random slope with species independent
re.3 < list(dat$X, sp = dat$sp, covar = diag(nspp))
# random slope with species showing phylogenetic covariances
re.4 < list(dat$X, sp = dat$sp, covar = Vphy)
# random effect for site
re.site < list(1, site = dat$site, covar = diag(nsite))
simple < communityPGLMM(Y ~ X, data = dat, family = "binomial", sp
= dat$sp, site = dat$site, random.effects = list(re.site),
REML=TRUE, verbose=FALSE)
# The rest of these tests are not run to save CRAN server time;
#  please take a look at them because they're *very* useful!
## Not run:
z.binary < communityPGLMM(Y ~ X, data = dat, family = "binomial",
sp = dat$sp, site = dat$site, random.effects = list(re.1, re.2,
re.3, re.4), REML = TRUE, verbose = FALSE)
# output results
z.binary
plot(z.binary)
# test statistical significance of the phylogenetic random effect
# on species slopes using a likelihood ratio test
communityPGLMM.binary.LRT(z.binary, re.number = 4)$Pr
# extract the predicted values of Y
communityPGLMM.predicted.values(z.binary, show.plot = TRUE)
# examine the structure of the overall covariance matrix
communityPGLMM.matrix.structure(Y ~ X, data = dat, family =
"binomial", sp = dat$sp, site = dat$site, random.effects =
list(re.1, re.2, re.3, re.4))
# look at the structure of re.1
communityPGLMM.matrix.structure(Y ~ X, data = dat, family =
"binomial", sp = dat$sp, site = dat$site, random.effects =
list(re.1))
# compare results to glmer() when the model contains no
# phylogenetic covariance among species; the results should be
# similar.
communityPGLMM(Y ~ X, data = dat, family = "binomial", sp = dat$sp,
site = dat$site, random.effects = list(re.1, re.3), REML = FALSE,
verbose = FALSE)
# lmer
if(require(lme4)){
summary(glmer(Y ~ X + (1  sp) + (0 + X  sp), data=dat, family =
"binomial"))
# compare results to lmer() when the model contains no phylogenetic
# covariance among species; the results should be similar.
communityPGLMM(Y ~ X, data = dat, family = "gaussian", sp = dat$sp,
site = dat$site, random.effects = list(re.1, re.3), REML = FALSE,
verbose = FALSE)
# lmer
summary(lmer(Y ~ X + (1  sp) + (0 + X  sp), data=dat, REML = FALSE))
}
## End(Not run)
