Phylogenetic Generalised Linear Mixed Model for Community Data

Description

This function performs Generalized Linear Mixed Models for binary and continuous phylogenetic data, estimating regression coefficients with approximate standard errors. It is modeled after lmer but is more general by allowing correlation structure within random effects; these correlations can be phylogenetic among species, or any other correlation structure, such as geographical correlations among sites. It is, however, much more specific than lmer in that it can only analyze a subset of1 the types of model designed handled by lmer. It is also much slower than lmer and requires users to specify correlation structures as covariance matrices. communityPGLMM can analyze models in Ives and Helmus (2011). It can also analyze bipartite phylogenetic data, such as that analyzed in Rafferty and Ives (2011), by giving sites phylogenetic correlations.

communityPGLMM.binary calculates the statistical significance of the random effects in the generalized linear mixed model from the marginal profile likelihood.

communityPGLMM.binary.LRT tests statistical significance of the phylogenetic random effect on species slopes using a likelihood ratio test

communityPGLMM.matrix.structure produces the entire covariance matrix structure (V) when you specify random effects.

communityPGLMM.predicted.values calculates the predicted values of Y; for the generalized linear mixed model (family = "binomial"), these values are in the logit-1 transformed space.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
communityPGLMM(formula, data = list(), family = "gaussian", sp = NULL,
  site = NULL, random.effects = list(), REML = TRUE, s2.init = NULL,
  B.init = NULL, reltol = 10^-6, maxit = 500, tol.pql = 10^-6,
  maxit.pql = 200, verbose = FALSE)

communityPGLMM.gaussian(formula, data = list(), family = "gaussian",
  sp = NULL, site = NULL, random.effects = list(), REML = TRUE,
  s2.init = NULL, B.init = NULL, reltol = 10^-8, maxit = 500,
  verbose = FALSE)

communityPGLMM.binary(formula, data = list(), family = "binomial",
  sp = NULL, site = NULL, random.effects = list(), REML = TRUE,
  s2.init = 0.25, B.init = NULL, reltol = 10^-5, maxit = 40,
  tol.pql = 10^-6, maxit.pql = 200, verbose = FALSE)

communityPGLMM.binary.LRT(x, re.number = 0, ...)

communityPGLMM.matrix.structure(formula, data = list(), family = "binomial",
  sp = NULL, site = NULL, random.effects = list(), ss = 1)

## S3 method for class 'communityPGLMM'
summary(object, digits = max(3, getOption("digits") -
  3), ...)

## S3 method for class 'communityPGLMM'
plot(x, digits = max(3, getOption("digits") - 3),
  ...)

communityPGLMM.predicted.values(x, show.plot = TRUE, ...)

Arguments

formula

a two-sided linear formula object describing the fixed-effects of the model; for example, Y ~ X.

data

a data.frame containing the variables named in formula. The data frame should have long format with factors specifying species and sites. communityPGLMM will reorder rows of the data frame so that species are nested within sites. Please note that calling as.data.frame.comparative.comm will return your comparative.comm object into this format for you.

family

either gaussian for a Linear Mixed Model, or binomial for binary dependent data.

sp

a factor variable that identifies species

site

a factor variable that identifies sites

random.effects

a list that contains, for non-nested random effects, lists of triplets of the form list(X, group = group, covar = V). This is modeled after the lmer formula syntax (X | group) where X is a variable and group is a grouping factor. Note that group should be either your sp or site variable specified in sp and site. The additional term V is a covariance matrix of rank equal to the number of levels of group that specifies the covariances among groups in the random effect X. For nested variable random effects, random.effects contains lists of quadruplets of the form list(X, group1 = group1, covar = V, group2 = group2) where group1 is nested within group2.

REML

whether REML or ML is used for model fitting. For the generalized linear mixed model for binary data, these don't have standard interpretations, and there is no log likelihood function that can be used in likelihood ratio tests.

s2.init

an array of initial estimates of s2 for each random effect that scales the variance. If s2.init is not provided for family="gaussian", these are estimated using in a clunky way using lm assuming no phylogenetic signal. A better approach is to run link[lme4:lmer]{lmer} and use the output random effects for s2.init. If s2.init is not provided for family="binomial", these are set to 0.25.

B.init

initial estimates of B, a matrix containing regression coefficients in the model for the fixed effects. This matrix must have dim(B.init)=c(p+1,1), where p is the number of predictor (independent) variables; the first element of B corresponds to the intercept, and the remaining elements correspond in order to the predictor (independent) variables in the formula. If B.init is not provided, these are estimated using in a clunky way using lm or glm assuming no phylogenetic signal. A better approach is to run lmer and use the output fixed effects for B.init.

reltol

a control parameter dictating the relative tolerance for convergence in the optimization; see optim.

maxit

a control parameter dictating the maximum number of iterations in the optimization; see optim.

tol.pql

a control parameter dictating the tolerance for convergence in the PQL estimates of the mean components of the binomial GLMM.

maxit.pql

a control parameter dictating the maximum number of iterations in the PQL estimates of the mean components of the binomial GLMM.

verbose

if TRUE, the model deviance and running estimates of s2 and B are plotted each iteration during optimization.

x

communityPGLMM object

re.number

which random.effect in x to be tested

...

additional arguments to summary and plotting functions (currently ignored)

ss

which of the random.effects to produce

object

communityPGLMM object to be summarised

digits

minimal number of significant digits for printing, as in print.default

show.plot

if TRUE (default), display plot

Details

The vignette 'pez-pglmm-overview' gives a gentle introduction to using PGLMMS. For linear mixed models (family = 'gaussian'), the function estimates parameters for the model of the form, for example,

y = beta_0 + beta_1x + b_0 + b_1x

b_0 ~ Gaussian(0, sigma_0^2I_(sp))

b_0 ~ Gaussian(0, sigma_0^2V_(sp))

e ~ Gaussian(0,sigma^2)

where beta_0 and beta_1 are fixed effects, and V_(sp) is a variance-covariance matrix derived from a phylogeny (typically under the assumption of Brownian motion evolution). Here, the variation in the mean (intercept) for each species is given by the random effect b_0 that is assumed to be independent among species. Variation in species' responses to predictor variable x is given by a random effect b_0 that is assumed to depend on the phylogenetic relatedness among species given by V_(sp_; if species are closely related, their specific responses to x will be similar. This particular model would be specified as

re.1 <- list(1, sp = dat$sp, covar = diag(nspp)) re.2 <- list(dat$X, sp = dat$sp, covar = Vsp) z <- communityPGLMM(Y ~ X, data = data, family = "gaussian", random.effects = list(re.1, re.2))

The covariance matrix covar is standardized to have its determinant equal to 1. This in effect standardizes the interpretation of the scalar sigma^2. Although mathematically this is not required, it is a very good idea to standardize the predictor (independent) variables to have mean 0 and variance 1. This will make the function more robust and improve the interpretation of the regression coefficients. For categorical (factor) predictor variables, you will need to construct 0-1 dummy variables, and these should not be standardized (for obvious reasons).

For binary generalized linear mixed models (family = 'binomial'), the function estimates parameters for the model of the form, for example,

y = beta_0 + beta_1x + b_0 + b_1x

Y = logit^(-1)(y)

b_0 ~ Gaussian(0, sigma_0^2I_(sp))

b_0 ~ Gaussian(0, sigma_0^2V_(sp))

where beta_0 and beta_1 are fixed effects, and V_(sp) is a variance-covariance matrix derived from a phylogeny (typically under the assumption of Brownian motion evolution).

z <- communityPGLMM(Y ~ X, data = data, family = 'binomial', random.effects = list(re.1, re.2))

As with the linear mixed model, it is a very good idea to standardize the predictor (independent) variables to have mean 0 and variance 1. This will make the function more robust and improve the interpretation of the regression coefficients. For categorical (factor) predictor variables, you will need to construct 0-1 dummy variables, and these should not be standardized (for obvious reasons).

Value

an object of class communityPGLMM

formula

the formula for fixed effects

data

the dataset

family

either gaussian or binomial depending on the model fit

random.effects

the list of random effects

B

estimates of the regression coefficients

B.se

approximate standard errors of the fixed effects regression coefficients

B.cov

approximate covariance matrix for the fixed effects regression coefficients

B.zscore

approximate Z scores for the fixed effects regression coefficients

B.pvalue

approximate tests for the fixed effects regression coefficients being different from zero

ss

random effects' standard deviations for the covariance matrix sigma^2 V for each random effect in order. For the linear mixed model, the residual variance is listed last

s2r

random effects variances for non-nested random effects

s2n

random effects variances for nested random effects

s2resid

for linear mixed models, the residual vairance

logLIK

for linear mixed models, the log-likelihood for either the restricted likelihood (REML=TRUE) or the overall likelihood (REML=FALSE). This is set to NULL for generalised linear mixed models

AIC

for linear mixed models, the AIC for either the restricted likelihood (REML=TRUE) or the overall likelihood (REML=FALSE). This is set to NULL for generalised linear mixed models

BIC

for linear mixed models, the BIC for either the restricted likelihood (REML=TRUE) or the overall likelihood (REML=FALSE). This is set to NULL for generalised linear mixed models

REML

whether or not REML is used (TRUE or FALSE)

s2.init

the user-provided initial estimates of s2

B.init

the user-provided initial estimates of B

Y

the response (dependent) variable returned in matrix form

X

the predictor (independent) variables returned in matrix form (including 1s in the first column)

H

the residuals. For the generalized linear mixed model, these are the predicted residuals in the logit -1 space

iV

the inverse of the covariance matrix for the entire system (of dimension (nsp*nsite) by (nsp*nsite))

mu

predicted mean values for the generalized linear mixed model. Set to NULL for linear mixed models

sp, sp

matrices used to construct the nested design matrix

Zt

the design matrix for random effects

St

diagonal matrix that maps the random effects variances onto the design matrix

convcode

the convergence code provided by optim

niter

number of iterations performed by optim

Note

These function do not use a comparative.comm object, but you can use as.data.frame.comparative.comm to create a data.frame for use with these functions. The power of this method comes from deciding your own parameters parameters to be determined (the data for regression, the random effects, etc.), and it is our hope that this interface gives you more flexibility in model selection/fitting.

Author(s)

Anthony R. Ives, cosmetic changes by Will Pearse

References

Ives, A. R. and M. R. Helmus. 2011. Generalized linear mixed models for phylogenetic analyses of community structure. Ecological Monographs 81:511-525.

Rafferty, N. E., and A. R. Ives. 2013. Phylogenetic trait-based analyses of ecological networks. Ecology 94:2321-2333.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
## Structure of examples:
# First, a (brief) description of model types, and how they are specified
# - these are *not* to be run 'as-is'; they show how models should be organised
# Second, a run-through of how to simulate, and then analyse, data
# - these *are* to be run 'as-is'; they show how to format and work with data

## Not run: 
#########################################################
#First section; brief summary of models and their use####
#########################################################
## Model structures from Ives & Helmus (2011)
# dat = data set for regression (note: *not* an comparative.comm object)
# nspp = number of species
# nsite = number of sites
# Vphy = phylogenetic covariance matrix for species
# Vrepul = inverse of Vphy representing phylogenetic repulsion

# Model 1 (Eq. 1)
re.site <- list(1, site = dat$site, covar = diag(nsite))
re.sp.site <- list(1, sp = dat$sp, covar = Vphy, site = dat$site) # note: nested
z <- communityPGLMM(freq ~ sp, data = dat, family = "binomial", sp
= dat$sp, site = dat$site, random.effects = list(re.site,
re.sp.site), REML = TRUE, verbose = TRUE, s2.init=.1)


# Model 2 (Eq. 2)
re.site <- list(1, site = dat$site, covar = diag(nsite))
re.slope <- list(X, sp = dat$sp, covar = diag(nspp))
re.slopephy <- list(X, sp = dat$sp, covar = Vphy)
z <- communityPGLMM(freq ~ sp + X, data = dat, family = "binomial",
sp = dat$sp, site = dat$site, random.effects = list(re.site,
re.slope, re.slopephy), REML = TRUE, verbose = TRUE, s2.init=.1)

# Model 3 (Eq. 3)
re.site <- list(1, site = dat$site, covar = diag(nsite))
re.sp.site <- list(1, sp = dat$sp, covar = Vrepul, site = dat$site) # note: nested
z <- communityPGLMM(freq ~ sp*X, data = dat, family = "binomial",
sp = dat$sp, site = dat$site, random.effects = list(re.site,
re.sp.site), REML = TRUE, verbose = TRUE, s2.init=.1)

## Model structure from Rafferty & Ives (2013) (Eq. 3)
# dat = data set
# npp = number of pollinators (sp)
# nsite = number of plants (site)
# VphyPol = phylogenetic covariance matrix for pollinators
# VphyPlt = phylogenetic covariance matrix for plants

re.a <- list(1, sp = dat$sp, covar = diag(nspp))
re.b <- list(1, sp = dat$sp, covar = VphyPol)
re.c <- list(1, sp = dat$sp, covar = VphyPol, dat$site)
re.d <- list(1, site = dat$site, covar = diag(nsite))
re.f <- list(1, site = dat$site, covar = VphyPlt)
re.g <- list(1, site = dat$site, covar = VphyPlt, dat$sp)
#term h isn't possible in this implementation, but can be done with
available matlab code

z <- communityPGLMM(freq ~ sp*X, data = dat, family = "binomial",
sp = dat$sp, site = dat$site, random.effects = list(re.a, re.b,
re.c, re.d, re.f, re.g), REML = TRUE, verbose = TRUE, s2.init=.1)

## End(Not run)

#########################################################
#Second section; detailed simulation and analysis #######
#NOTE: this section is explained and annotated in #######
#      detail in the vignette 'pez-pglmm-overview'#######
#      run 'vignette('pez-pglmm-overview') to read#######
#########################################################
# Generate simulated data for nspp species and nsite sites
nspp <- 15
nsite <- 10

# residual variance (set to zero for binary data)
sd.resid <- 0

# fixed effects
beta0 <- 0
beta1 <- 0

# magnitude of random effects
sd.B0 <- 1
sd.B1 <- 1

# whether or not to include phylogenetic signal in B0 and B1
signal.B0 <- TRUE
signal.B1 <- TRUE

# simulate a phylogenetic tree
phy <- rtree(n = nspp)
phy <- compute.brlen(phy, method = "Grafen", power = 0.5)

# standardize the phylogenetic covariance matrix to have determinant 1
Vphy <- vcv(phy)
Vphy <- Vphy/(det(Vphy)^(1/nspp))

# Generate environmental site variable
X <- matrix(1:nsite, nrow = 1, ncol = nsite)
X <- (X - mean(X))/sd(X)

# Perform a Cholesky decomposition of Vphy. This is used to
# generate phylogenetic signal: a vector of independent normal random
# variables, when multiplied by the transpose of the Cholesky
# deposition of Vphy will have covariance matrix equal to Vphy.

iD <- t(chol(Vphy))

# Set up species-specific regression coefficients as random effects
if (signal.B0 == TRUE) {
		b0 <- beta0 + iD %*% rnorm(nspp, sd = sd.B0)
} else {
		b0 <- beta0 + rnorm(nspp, sd = sd.B0)
}
if (signal.B1 == TRUE) {
		b1 <- beta1 + iD %*% rnorm(nspp, sd = sd.B1)
} else {
		b1 <- beta1 + rnorm(nspp, sd = sd.B1)
}

# Simulate species abundances among sites to give matrix Y that
# contains species in rows and sites in columns
y <- rep(b0, each=nsite)
y <- y + rep(b1, each=nsite) * rep(X, nspp)
y <- y + rnorm(nspp*nsite) #add some random 'error'
Y <- rbinom(length(y), size=1, prob=exp(y)/(1+exp(y)))
y <- matrix(outer(b0, array(1, dim = c(1, nsite))), nrow = nspp,
ncol = nsite) + matrix(outer(b1, X), nrow = nspp, ncol = nsite)
e <- rnorm(nspp * nsite, sd = sd.resid)
y <- y + matrix(e, nrow = nspp, ncol = nsite)
y <- matrix(y, nrow = nspp * nsite, ncol = 1)

Y <- rbinom(n = length(y), size = 1, prob = exp(y)/(1 + exp(y)))
Y <- matrix(Y, nrow = nspp, ncol = nsite)

# name the simulated species 1:nspp and sites 1:nsites
rownames(Y) <- 1:nspp
colnames(Y) <- 1:nsite

par(mfrow = c(3, 1), las = 1, mar = c(2, 4, 2, 2) - 0.1)
matplot(t(X), type = "l", ylab = "X", main = "X among sites")
hist(b0, xlab = "b0", main = "b0 among species")
hist(b1, xlab = "b1", main = "b1 among species")

#Plot out; you get essentially this from plot(your.pglmm.model)
image(t(Y), ylab = "species", xlab = "sites", main = "abundance",
col=c("black","white"))

# Transform data matrices into "long" form, and generate a data frame
YY <- matrix(Y, nrow = nspp * nsite, ncol = 1)

XX <- matrix(kronecker(X, matrix(1, nrow = nspp, ncol = 1)), nrow =
nspp * nsite, ncol = 1)

site <- matrix(kronecker(1:nsite, matrix(1, nrow = nspp, ncol =
1)), nrow = nspp * nsite, ncol = 1)
sp <- matrix(kronecker(matrix(1, nrow = nsite, ncol = 1), 1:nspp),
nrow = nspp * nsite, ncol = 1)

dat <- data.frame(Y = YY, X = XX, site = as.factor(site), sp = as.factor(sp))

# Format input and perform communityPGLMM()
# set up random effects

# random intercept with species independent
re.1 <- list(1, sp = dat$sp, covar = diag(nspp))

# random intercept with species showing phylogenetic covariances
re.2 <- list(1, sp = dat$sp, covar = Vphy)

# random slope with species independent
re.3 <- list(dat$X, sp = dat$sp, covar = diag(nspp))

# random slope with species showing phylogenetic covariances
re.4 <- list(dat$X, sp = dat$sp, covar = Vphy)

# random effect for site
re.site <- list(1, site = dat$site, covar = diag(nsite))

simple <- communityPGLMM(Y ~ X, data = dat, family = "binomial", sp
= dat$sp, site = dat$site, random.effects = list(re.site),
REML=TRUE, verbose=FALSE)

# The rest of these tests are not run to save CRAN server time;
# - please take a look at them because they're *very* useful!
## Not run: 
z.binary <- communityPGLMM(Y ~ X, data = dat, family = "binomial",
sp = dat$sp, site = dat$site, random.effects = list(re.1, re.2,
re.3, re.4), REML = TRUE, verbose = FALSE)

# output results
z.binary
plot(z.binary)

# test statistical significance of the phylogenetic random effect
# on species slopes using a likelihood ratio test
communityPGLMM.binary.LRT(z.binary, re.number = 4)$Pr

# extract the predicted values of Y
communityPGLMM.predicted.values(z.binary, show.plot = TRUE)

# examine the structure of the overall covariance matrix
communityPGLMM.matrix.structure(Y ~ X, data = dat, family =
"binomial", sp = dat$sp, site = dat$site, random.effects =
list(re.1, re.2, re.3, re.4))

# look at the structure of re.1
communityPGLMM.matrix.structure(Y ~ X, data = dat, family =
"binomial", sp = dat$sp, site = dat$site, random.effects =
list(re.1))

# compare results to glmer() when the model contains no
# phylogenetic covariance among species; the results should be
# similar.
communityPGLMM(Y ~ X, data = dat, family = "binomial", sp = dat$sp,
site = dat$site, random.effects = list(re.1, re.3), REML = FALSE,
verbose = FALSE)

# lmer
if(require(lme4)){
summary(glmer(Y ~ X + (1 | sp) + (0 + X | sp), data=dat, family =
"binomial"))

# compare results to lmer() when the model contains no phylogenetic
# covariance among species; the results should be similar.
communityPGLMM(Y ~ X, data = dat, family = "gaussian", sp = dat$sp,
site = dat$site, random.effects = list(re.1, re.3), REML = FALSE,
verbose = FALSE)

# lmer
summary(lmer(Y ~ X + (1 | sp) + (0 + X | sp), data=dat, REML = FALSE))
}

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.