manyglm  R Documentation 
manyglm
is used to fit generalized linear models to highdimensional data, such as multivariate abundance data in ecology. This is the base modelfitting function  see plot.manyglm
for assumption checking, and anova.manyglm
or summary.manyglm
for significance testing.
manyglm(formula, family="negative.binomial", composition=FALSE, data=NULL, subset=NULL, na.action=options("na.action"), K=1, theta.method = "PHI", model = FALSE, x = TRUE, y = TRUE, qr = TRUE, cor.type= "I", shrink.param=NULL, tol=sqrt(.Machine$double.eps), maxiter=25, maxiter2=10, show.coef=FALSE, show.fitted=FALSE, show.residuals=FALSE, show.warning=FALSE, offset, ... )
formula 
an object of class 
family 
a description of the distribution function to be used in the
in the model. The default is negative binomial regression (using a log
link, with unknown overdispersion parameter), the following family
functions are also accepted: binomial(), binomial(link="cloglog"),
poisson(), Gamma(link="log"), which can also be specified using character strings as
'binomial', 'cloglog' and 'poisson', 'gamma' respectively. In future we hope
to include other family functions as described in 
data 
an optional data frame, list or environment (or object
coercible by 
composition 

subset 
an optional vector specifying a subset of observations to be used in the fitting process. 
na.action 
a function which indicates what should happen
when the data contain 
K 
number of trials in binomial regression. By default, K=1 for presenceabsence data using logistic regression. 
theta.method 
the method used for the estimation of the overdisperson
parameter theta, such that the meanvariance relationship is V=m+m^2/theta for
the negative binomial family. Here offers three options 
model, x, y, qr 
logicals. If 
cor.type 
the structure imposed on the estimated correlation
matrix under the fitted model. Can be "I"(default), "shrink", or "R".
See Details. This parameter is merely stored in 
shrink.param 
shrinkage parameter to be used if 
tol 
the tolerance used in estimation. 
maxiter 
maximum allowed iterations in the weighted least square estimation of beta. The default value is 25. 
maxiter2 
maximum allowed iterations in the internal ML estimations of negative binomial regression. The default value is 10. 
show.coef, show.fitted, show.residuals, show.warning 
logical. Whether to show model coefficients, fitted values, standardized pearson residuals, or operation warnings. 
offset 
this can be used to specify an _a priori_ known component to be included in the linear predictor during fitting. This should be 'NULL' or a numeric vector of length equal to NROW (i.e. number of observations) or a matrix of NROW times p (i.e. number of species). 
... 
further arguments passed to or from other methods. 
manyglm
is used to calculate the parameter estimates of generalised linear models fitted to each of many variables simultaneously as in Warton et. al. (2012) and Wang et.al.(2012). Models for manyglm
are specified symbolically. For details on how to specify a formula see the details section of lm
and formula
.
Generalised linear models are designed for nonnormal data for which a distribution can be specified that offers a reasonable model for data, as specified using the argument family
. The manyglm
function currently handles count and binary data, and accepts either a character argument or a family argument for common choices of family. For binary (presence/absence) data, family=binomial()
can be used for logistic regression (logit link, "logistic regression"), or the complementary loglog link can be used family=binomial("cloglog")
, arguably a better choice for presenceabsence data. Poisson regression family=poisson()
can be used for counts that are not "overdispersed" (that is, if the variance is not larger than the mean), although for multivariate abundance data it has been shown that the negative binomial distribution (family="negative.binomial"
) is usually a better choice (Warton 2005). In both cases, a loglink is used. If another link function or family is desired, this can be specified using the manyany
function, which accepts regular family
arguments.
composition=TRUE
allows relative abundance to be modelled rather than absolute abundance, which is useful for partitioning effects of environmental variables on alpha vs beta diversity, and is needed if there is variation in sampling intensity due to variables that haven't been measured. Data are manipulated into "long format", with column factor called cols
and row variable called rows
, then a model is fitted using main effects for predictors as in the provided formula, plus rows
, cols
and the interaction of predictors with cols
. Inclusion of rows
in the model accounts for variation in sampling intensity across rows, main effects for environmental variables capture their effects on total abundance/richness (alpha diversity), and their interaction with cols
captures changes in relative abundance/turnover (beta diversity). Unfortunately, data are not efficiently stored in long format, so models are slower to fit using composition=TRUE
.
In negative binomial regression, the overdispersion parameter (theta
) is estimated separately for each variable from the data, as controlled by theta.method
for negative binomial distributions. We iterate between updates of theta
and generalised linear model updates for regression parameters, as many as maxiter2
times.
cor.type
is the structure imposed on the estimated correlation
matrix under the fitted model. Possible values are:
"I"
(default) = independence is assumed (correlation matrix is the identity)
"shrink"
= sample correlation matrix is shrunk towards I to improve its stability.
"R"
= unstructured correlation matrix is used. (Only available when N>p.)
If cor.type=="shrink"
, a numerical value will be assigned to shrink.param
either through the argument or by internal estimation. The working horse function for the internal estimation is ridgeParamEst
, which is based on crossvalidation (Warton 2008). The validation groups are chosen by random assignment, so some slight variation in the estimated values may be observed in repeat analyses. See ridgeParamEst
for more details. The shrinkage parameter can be any value between 0 and 1 (0="I" and 1="R", values closer towards 0 indicate more shrinkage towards "I").
manyglm
returns an object inheriting from "manyglm"
,
"manylm"
and "mglm".
The function summary
(i.e. summary.manyglm
) can be used to obtain or print a summary of the results and the function
anova
(i.e. anova.manyglm
) to produce an
analysis of variance table, although note that these functions use resampling so they can take a while to fit.
The generic accessor functions coefficients
,
fitted.values
and residuals
can be used to
extract various useful features of the value returned by manyglm
.
An object of class "manyglm"
is a list containing at least the
following components:
coefficients 
a named matrix of coefficients. 
var.coefficients 
the estimated variances of each coefficient. 
fitted.values 
the matrix of fitted mean values, obtained by transforming the linear predictors by the inverse of the link function. 
linear.predictor 
the linear fit on the scale of the linear predictor. 
residuals 
the matrix of working residuals, that is the Pearson residuals standardized by the leverage adjustment h obtained from the diagonal elements of the hat matrix H. 
PIT.residuals 
probability integral transform (PIT) residuals  for a model that fits well these should be approximately standard uniform values evenly scattered between 0 and 1. Their calculation involves some randomisation, so different fits will return slightly different values for PIT residuals. 
sqrt.1_Hii 
the matrix of scale terms used to standardize the Pearson reidusals. 
var.estimator 
the estimated variance of each observation, computed using the corresponding family function. 
sqrt.weight 
the matrix of square root of working weights, estimated for the corresponding family function. 
theta 
the estimated nuisance parameters accounting for overdispersion 
two.loglike 
two times the log likelihood. 
deviance 
up to a constant, minus twice the maximized loglikelihood. Where sensible, the constant is chosen so that a saturated model has deviance zero. 
iter 
the number of iterations of IWLS used. 
data 
a data frame storing the input data. 
stderr.coefficients 
the estimated standard error of each coefficient. 
phi 
the inverse of theta 
tol 
the tolerance used in estimations. 
maxiter,maxiter2,family,theta.method,cor.type,formula 
arguments supplied in the 
aic 
a vector returning Akaike's An Information Criterion for each response variable  minus twice the maximized loglikelihood plus twice the number of coefficients. 
AICsum 
the sum of the AIC's over all variables. 
shrink.param 
the shrink parameter to be used in subsequent inference. 
call 
the matched call. 
terms 
the 
rank 
the numeric rank of the fitted linear model. 
xlevels 
(where relevant) a record of the levels of the factors used in fitting. 
df.residual 
the residual degrees of freedom. 
x 
if the argument 
y 
if the argument 
model 
if the argument 
qr 
if the argument 
show.coef,show.fitted,show.residuals 
arguments supplied in the 
offset 
the 
Yi Wang, Ulrike Naumann and David Warton <David.Warton@unsw.edu.au>.
Lawless, J. F. (1987) Negative binomial and mixed Poisson regression, Canadian Journal of Statistics 15, 209225.
Hilbe, J. M. (2008) Negative Binomial Regression, Cambridge University Press, Cambridge.
Warton D.I. (2005) Many zeros does not mean zero inflation: comparing the goodness offit of parametric models to multivariate abundance data, Environmetrics 16(3), 275289.
Warton D.I. (2008). Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association 103, 340349.
Warton D.I. (2011). Regularized sandwich estimators for analysis of high dimensional data using generalized estimating equations. Biometrics, 67(1), 116123.
Warton D. I., Wright S., and Wang, Y. (2012). Distancebased multivariate analyses confound location and dispersion effects. Methods in Ecology and Evolution, 3(1), 89101.
Wang Y., Neuman U., Wright S. and Warton D. I. (2012). mvabund: an R package for modelbased analysis of multivariate abundance data. Methods in Ecology and Evolution. online 21 Feb 2012.
anova.manyglm
, summary.manyglm
, residuals.manyglm
, plot.manyglm
data(spider) spiddat < mvabund(spider$abund) X < as.matrix(spider$x) #To fit a loglinear model assuming counts are poisson: glm.spid < manyglm(spiddat~X, family="poisson") glm.spid summary(glm.spid, resamp="residual") #To fit a binomial regression model to presence/absence data: pres.abs < spiddat pres.abs[pres.abs>0] = 1 Xdf < data.frame(spider$x) #turn into a data frame to refer to variables in formula glm.spid.bin < manyglm(pres.abs~soil.dry+bare.sand+moss, data=Xdf, family="binomial") glm.spid.bin drop1(glm.spid.bin) #AICs for oneterm deletions, suggests dropping bare.sand glm2.spid.bin < manyglm(pres.abs~soil.dry+moss, data=Xdf, family="binomial") drop1(glm2.spid.bin) #backward elimination suggests settling on this model.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.