View source: R/anova.manyany.R
anova.manyany | R Documentation |
Compute an analysis of deviance table for many univariate model fits. Slowly!
## S3 method for class 'manyany' anova(object, ..., nBoot=99, p.uni="none", block=object1$block, nCores=1, bootID=NULL, replace=TRUE) ## S3 method for class 'anova.manyany' print(x, ...)
object |
of class |
... |
other generic |
nBoot |
the number of Bootstrap iterations, default is |
p.uni |
whether to calculate univariate test statistics and their P-values. |
block |
a factor specifying the sampling level to be resampled. Default is resampling rows (if composition=TRUE in the manyany command, this means resampling rows of data as originally sent to manyany). |
nCores |
Number of cores to use for computations (for parallel computing). |
bootID |
A user-entered matrix of indices for which observations to use in which resample. Bootstrap
resamples in rows, observations in columns. When specified, overwrites |
replace |
whether to sample with or without replacement, as in the |
x |
|
The anova.manyany
function returns a table summarising the statistical significance
of a fitted manyany model under the alternative hypothesis (object2
) as compared
to a fit under the null hypothesis (object
). Typically the alternative model
is nested in the null although it doesn't need to be (but consider seriously if what
you are doing makes sense if they are not nested).
This function is quite computationally intensive, and a little fussy - it is an early version we hope to improve on. Feedback welcome!
This function behaves a lot like anova.manyglm
, the most conspicuous differences
being in flexibility and computation time. Since this function is based on manyany
,
it offers much greater flexibility in terms of types of models that can be fitted (most
fixed effects model with predict
and family
arguments could be accommodated).
For information on the different types of data that can be modelled using manyany, see
manyany
.
However this flexibility comes at considerable cost in terms of computation time, and the
default nBoot
has been set to 99 to reflect this (although rerunning at 999 is
recommended). Other more cosmetic differences from anova.manyglm
are that
two and only two models can be supplied as input here; adjusted univariate P-values
are not yet implemented; and the range of test statistics and resampling algorithms is
more limited. All test statistics constructed here are sum-of-likelihood ratio statistics
as in Warton et al (2012), and the resampling method used here is the PIT-trap (short
for 'probability integral transform residual bootstrap', Warton et al 2017).
To check model assumptions, use plot.manyany
.
The block
argument allows for block resampling, such that valid inferences can
be made across independent blocks of correlated sets of observations.
For example, if data have multiple rows of records for each site, e.g. multi-species
data with entries for different species on different rows, you can use your site ID
variable as the block argument to resample sites, for valid cross-site inferences despite
within-site species correlation. Well, valid assuming sites are independent. You could
do similarly for a repeated measures design to make inferences robust to temporal autocorrelation.
Note that block
needs to be balanced, e.g. equal number of species entries for
each site (i.e. include rows for zero abundances too).
The anova.manyany
function is designed specifically for high-dimensional data
(that, is when the number of variables p is not small compared to the number of observations
N). In such instances a correlation matrix is computationally intensive to estimate
and is numerically unstable, so by default the test statistic is calculated assuming
independence of variables. Note however that the resampling scheme used ensures that
the P-values are approximately correct even when the independence assumption is not
satisfied.
Rather than stopping after testing for multivariate effects, it is often of interest
to find out which response variables express significant effects. Univariate statistics
are required to answer this question, and these are reported if requested. Setting p.uni="unadjusted"
returns resampling-based univariate P-values for all effects as well as the multivariate
P-values, if composition=FALSE
. There are currently no univariate P-value options
when composition=TRUE
(it's not entirely clear how such P-values should be obtained)
and if univariate P's are of interest why not rerun the model with composition=FALSE
.
A current limitation of the function is that composition
needs to be set to
the same value in each manyany object being compared - it is not currently possible
to compare models with and without a compositional term in them.
stat |
the observed value of the test statistic. |
p |
the P-value as estimated from |
stat.i |
the values of the test statistic in each of the |
p.i |
the P-value in each of the |
p.uni |
the |
If p.uni="unadjusted"
the output list also contains
uni.test |
a table showing the test statistics of the univariate tests. |
uni.p |
a table showing the p-values of the univariate tests. |
statj.i |
a matrix of values of the univariate test statistics in each of the |
The comparison between two or more models by anova.manyglm
will only be valid if they are fitted to the same dataset. This may be a problem if there are missing values and R's default of na.action = na.omit
is used.
David Warton <David.Warton@unsw.edu.au>.
Warton D. I., Wright S., and Wang, Y. (2012). Distance-based multivariate analyses confound location and dispersion effects. Methods in Ecology and Evolution, 3(1), 89-101.
Warton D. I., Thibaut L., Wang Y. A. (2017). The PIT-trap - A "model-free" bootstrap procedure for inference about regression models with discrete, multivariate responses. PLoS One, 12(7), e0181790.
manyany
, anova.manyglm
.
## Try fitting Tikus Islands data with Tweedie models with power parameter 1.5, ## to test for compositional effect: data(tikus) coral <- as.matrix(tikus$abund[1:20,]) sumSpp = apply(coral>0,2,sum) coral <- coral[,sumSpp>6] ## cutting to just species with seven(!) or more presences to cut ## computation time. Maybe rerun with less (e.g. 4 or more presences) if curious and patient. coralX <- tikus$x[1:20,] require(tweedie) require(statmod) ftTimeRep <- manyany(coral ~ time+rep, "glm", data=coralX, family=tweedie(var.power=1.5, link.power=0), var.power=1.5, composition=TRUE) ftRep <- manyany(coral ~ rep, "glm", data=coralX, family=tweedie(var.power=1.5, link.power=0), var.power=1.5, composition=TRUE) anova(ftRep,ftTimeRep,nBoot=9) #this takes a few seconds to run even for just 9 resamples ## This should be rerun for nBoot=999, which could a few minutes... ## Not run: library(ordinal) ## First construct an ordinal dataset: ## Not run: spidOrd = spider$abund ## Not run: spidOrd[spider$abund>1 & spider$abund<=10]=2 ## Not run: spidOrd[spider$abund>10]=3 ## Now fit a model using the clm function: ## Not run: manyOrd=manyany(spidOrd~bare.sand+fallen.leaves,"clm",data=spider$x) ## Test to see if fallen.leaves needs to be there: ## Not run: manyOrd0=manyany(spidOrd~bare.sand,"clm",data=spider$x) ## Not run: anova(manyOrd0,manyOrd,nBoot=19)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.