Summarizing Multivariate Generalized Linear Model Fits for Abundance Data
Description
summary
method for class "manyglm".
Usage
1 2 3 4 5 6 7  ## S3 method for class 'manyglm'
summary(object, resamp="pit.trap", test="wald",
p.uni="none", nBoot=1000, cor.type=object$cor.type,
show.cor = FALSE, show.est=FALSE, show.residuals=FALSE,
symbolic.cor = FALSE, show.time=FALSE, show.warning=FALSE,...)
## S3 method for class 'summary.manyglm'
print(x, ...)

Arguments
object 
an
object of class 
resamp 
the method of resampling used. Can be one of "case", "perm.resid", "montecarlo" or "pit.trap" (default). See Details. 
test 
the test to be used. If 
p.uni 
whether to calculate univariate test
statistics and their Pvalues, and if so, what type. This can be one of the
following options. 
nBoot 
the number of Bootstrap iterations, default is

cor.type 
structure imposed on the estimated correlation matrix under the fitted model. Can be "I"(default), "shrink", or "R". See Details. 
show.cor, show.est, show.residuals 
logical, if 
symbolic.cor 
logical.
If 
show.time 
Whether to display timing information for the resampling procedure: "none" shows none, "all" shows all timing information and "total" shows only the overall time taken for the tests. 
show.warning 
logical. Whether to display warnings in the operation procedure. 
... 
for 
x 
an object of
class "summary.manyglm", usually, a result of a call to

Details
The summary.manyglm
function returns a table summarising the
statistical significance of each multivariate term specified in the fitted
manyglm model (Warton (2011)). For each model term, it returns a test
statistic as determined by the argument test
, and a Pvalue calculated
by resampling rows of the data using a method determined by the argument
resamp
. Of the four possible resampling methods, three (case, residual
permutation and parametric boostrap) are described in more detail in Davison
and Hinkley (1997, chapter 6), but the default (PITtrap) is a new method (in
review) which bootstraps probability integral transform residuals, and which
we have found to give the most reliable Type I error rates. All methods
involve resampling under the alternative hypothesis. These methods ensure
approximately valid inference even when the meanvariance relationship or the
correlation between variables has been misspecified. Standardized pearson
residuals (see manyglm
are currently used in residual
permutation, and where necessary, resampled response values are truncated so
that they fall in the required range (e.g. counts cannot be negative).
However, this can introduce bias, especially for family=binomial
, so
we advise extreme caution using perm.resid
for presence/absence data.
If resamp="none"
, pvalues cannot be calculated, however the test
statistics are returned.
If you have a specific hypothesis of primary interest that you want to test, then you should use the anova.manyglm
function, which can resample rows of the data under the null hypothesis and so usually achieves a better approximation to the true significance level.
For information on the different types of data that can be modelled using manyglm, see manyglm
. To check model assumptions, use plot.manyglm
.
Multivariate test statistics are constructed using one of three methods: a loglikelihood ratio statistic test="LR"
, for example as in Warton et. al. (2012), or a Wald statistic test="wald"
or a Score statistic test="score"
. "LR" has good properties, but is only available when cor.type="I"
.
The default Wald test statistic makes use of a generalised estimating equations (GEE) approach, estimating the covariance matrix of parameter estimates using a sandwichtype estimator that assumes the meanvariance relationship in the data is correctly specified and that there is an unknown but constant correlation across all observations. Such assumptions allow the test statistic to account for correlation between variables but to do so in a more efficient way than traditional GEE sandwich estimators (Warton 2008a). The common correlation matrix is estimated from standardized Pearson residuals, and the method specified by cor.type
is used to adjust for high dimensionality.
The Wald statistic has problems for count data and presenceabsence data when there are zero parameters, so is not recommended for multisample tests, where such situations are common.
The summary.manyglm
function is designed specifically for highdimensional data (that, is when the number of variables p is not small compared to the number of observations N). In such instances a correlation matrix is computationally intensive to estimate and is numerically unstable, so by default the test statistic is calculated assuming independence of variables (cor.type="I"
). Note however that the resampling scheme used ensures that the Pvalues are approximately correct even when the independence assumption is not satisfied. However if it is computationally feasible for your dataset, it is recommended that you use cor.type="shrink"
to account for correlation between variables, or cor.type="R"
when p is small. The cor.type="R"
option uses the unstructured correlation matrix (only possible when N>p), such that the standard classical multivariate test statistics are obtained. Note however that such statistics are typically numerically unstable and have low power when p is not small compared to N.
The cor.type="shrink"
option applies ridge regularisation (Warton (2008b)), shrinking the sample correlation matrix towards the identity, which improves its stability when p is not small compared to N. This provides a compromise between "R"
and "I"
, allowing us to account for correlation between variables, while using a numerically stable test statistic that has good properties.
The shrinkage parameter is an attribute of the manyglm
object. For a Wald test, the sample correlation matrix of the alternative model is used to calculate the test statistics. So object$shrink.param
is used. For a Score test, the sample correlation matrix of the null model is used to calculate the test statistics. So shrink.param
of the null model is used instead. If cor.type=="shrink"
but object$shrink.param
is not available, for example object$cor.type!="shrink"
, then the shrinkage parameter will be estimated by crossvalidation using the multivariate normal likelihood function (see ridgeParamEst
and (Warton 2008b)) in the summary test.
Rather than stopping after testing for multivariate effects, it is often of interest to find out which response variables express significant effects. Univariate statistics are required to answer this question, and these are reported if requested. Setting p.uni="unadjusted"
returns resamplingbased univariate Pvalues for all effects as well as the multivariate Pvalues, whereas p.uni="adjusted"
returns adjusted Pvalues (that have been adjusted for multiple testing), calculated using a stepdown resampling algorithm as in Westfall & Young (1993, Algorithm 2.8). This method provides strong control of familywise error rates, and makes use of resampling (using the method controlled by resamp
) to ensure inferences take into account correlation between variables.
Value
summary.manyglm returns an object of class "summary.manyglm", a list with components
call 
the component from 
terms 
the terms object used. 
family 
the component from 
deviance 
the component from 
aic 
Akaike's An Information Criterion, minus twice the maximized loglikelihood plus twice the number of coefficients (except for negative binomial and quasipoisson family, assuming that the dispersion is known). 
df.residual 
the component from 
null.deviance 
the component from 
df.null 
the component from 
devll 
minus twice the maximized loglikelihood 
iter 
the number of iterations that were used in

p.uni 
the supplied argument. 
nBoot 
the supplied argument. 
resample 
the supplied argument. 
na.action 
the na.action used in the 
show.residuals 
the supplied argument. 
show.est 
the supplied argument. 
compositional 
logical. Whether a test for compositional effects was performed. 
test 
the supplied argument. 
cor.type 
the supplied argument. 
method 
the method used in 
theta.method 
the method used for the estimation of the nuisance parameter theta. 
manyglm.args 
a list of control parameters from 
rankX 
the rank of the design matrix. 
covstat 
the supplied argument. 
deviance.resid 
the deviance residuals. 
est 
the estimated model coefficients 
s.err 
the Scaled Variance 
shrink.param 
the shrinkage parameter. Either the value of the argument with the same name or if this was not supplied the estimated shrinkage parameter. 
n.bootsdone 
the number of bootstrapping iterations that were done, i.e. had no error. 
coefficients 
the matrix of coefficients, standard errors, zvalues and pvalues. Aliased coefficients are omitted. 
stat.iter 
if the argument 
statj.iter 
if the argument 
aliased 
named logical vector showing if the original coefficients are aliased. 
dispersion 
either the supplied argument or the inferred/estimated
dispersion if the latter is 
df 
a 3vector of the rank of the model and the number of residual degrees of freedom, plus number of nonaliased coefficients. 
overall.n.bootsdone 
the number of bootstrap iterations without errors that were done in the overall test 
statistic 
a table containing test statistics, p values and degrees of freedom for the overall test 
overall.stat.iter 
if the argument 
overall.statj.iter 
if the argument 
cov.unscaled 
the unscaled ( 
cov.scaled 
ditto, scaled by 
correlation 
(only if the argument 
symbolic.cor 
(only if 
Author(s)
Yi Wang, David Warton <David.Warton@unsw.edu.au> and Ulrike Naumann.
References
Warton D.I. (2011). Regularized sandwich estimators for analysis of high dimensional data using generalized estimating equations. Biometrics, 67(1), 116123.
Warton D.I. (2008a). Penalized normal likelihood and ridge regularization of correlation and covariance matrices. Journal of the American Statistical Association 103, 340349.
Warton D.I. (2008b). Which Wald statistic? Choosing a parameterisation of the Wald statistic to maximise power in ksample generalised estimating equations. Journal of Statistical Planning and Inference, 138, 32693282.
Warton D. I., Wright S., and Wang, Y. (2012). Distancebased multivariate analyses confound location and dispersion effects. Methods in Ecology and Evolution, 3(1), 89101.
Davison, A. C. and Hinkley, D. V. (1997) Bootstrap Methods and their Application, Cambridge University Press, Cambridge.
Westfall, P. H. and Young, S. S. (1993) Resamplingbased multiple testing. John Wiley & Sons, New York.
Wu, C. F. J. (1986) Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis. The Annals of Statistics 14:4, 12611295.
See Also
manyglm
, anova.manyglm
.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13  data(spider)
spiddat < mvabund(spider$abund)
X < spider$x
## Estimate the coefficients of a multivariate glm
glm.spid < manyglm(spiddat[,1:3]~X, family="negative.binomial")
## Estimate the statistical significance of different multivariate terms in
## the model, using the default settings of LR test, and 100 PITtrap resamples
summary(glm.spid, show.time=TRUE)
## Repeat with the parametric bootstrap and wald statistics
summary(glm.spid, resamp="monte.carlo", test="wald", nBoot=300)
