# Permutation tests in Vegan

### Description

From version 2.2-0, vegan has significantly improved access to restricted permutations which brings it into line with those offered by Canoco. The permutation designs are modelled after the permutation schemes of Canoco 3.1 (ter Braak, 1990).

vegan currently provides for the following features within permutation tests:

Free permutation of

*DATA*, also known as randomisation,Free permutation of

*DATA*within the levels of a grouping variable,Restricted permutations for line transects or time series,

Permutation of groups of samples whilst retaining the within-group ordering,

Restricted permutations for spatial grids,

Blocking, samples are never permuted

*between*blocks, andSplit-plot designs, with permutation of whole plots, split plots, or both.

Above, we use *DATA* to mean either the observed data themselves
or some function of the data, for example the residuals of an
ordination model in the presence of covariables.

These capabilities are provided by functions from the permute
package. The user can request a particular type of permutation by
supplying the `permutations`

argument of a function with an
object returned by `how`

, which defines how samples should
be permuted. Alternatively, the user can simply specify the required
number of permutations and a simple randomisation procedure will be
performed. Finally, the user can supply a matrix of permutations (with
number of rows equal to the number of permutations and number of
columns equal to the number of observations in the data) and
vegan will use these permutations instead of generating new
permutations.

The majority of functions in vegan allow for the full range of
possibilities outlined above. Exceptions include
`kendall.post`

and `kendall.global`

.

The Null hypothesis for the first two types of permutation test listed
above assumes free exchangeability of *DATA* (within the levels
of the grouping variable, if specified). Dependence between
observations, such as that which arises due to spatial or temporal
autocorrelation, or more-complicated experimental designs, such as
split-plot designs, violates this fundamental assumption of the test
and requires more complex restricted permutation test designs. It is
these designs that are available via the permute package and to
which vegan provides access from version 2.2-0 onwards.

Unless otherwise stated in the help pages for specific functions, permutation tests in vegan all follow the same format/structure:

An appropriate test statistic is chosen. Which statistic is chosen should be described on the help pages for individual functions.

The value of the test statistic is evaluate for the observed data and analysis/model and recorded. Denote this value

*x[0]*.The

*DATA*are randomly permuted according to one of the above schemes, and the value of the test statistic for this permutation is evaluated and recorded.Step 3 is repeated a total of

*n*times, where*n*is the number of permutations requested. Denote these values as*x[i]*, where*{i = 1, …, n}.*Count the number of values of the test statistic,

*x[i]*, in the Null distribution that are as extreme as test statistic for the observed data*x[0]*. Denote this count as*N*.We use the phrase

*as extreme*to include cases where a two-sided test is performed and large negative values of the test statistic should be considered.The permutation p-value is computed as

*(N + 1) / (n + 1)*

The above description illustrates why the default number of
permutations specified in vegan functions takes values of 199 or
999 for example. Pretty *p* values are achieved because the
*+ 1* in the denominator results in division by 200 or 1000, for
the 199 or 999 random permutations used in the test.

The simple intuition behind the presence of *+ 1* in the numerator
and denominator is that these represent the inclusion of the observed
value of the statistic in the Null distribution (e.g. Manly 2006).
Phipson & Smyth (2010) present a more compelling explanation for the
inclusion of *+ 1* in the numerator and denominator of the
*p* value calculation.

Fisher (1935) had in mind that a permutation test would involve
enumeration of all possible permutations of the data yielding an exact
test. However, doing this complete enumeration may not be feasible in
practice owing to the potentially vast number of arrangements of the
data, even in modestly-sized data sets with free permutation of
samples. As a result we evaluate the *p* value as the tail
probability of the Null distribution of the test statistic directly
from the random sample of possible permutations. Phipson & Smyth
(2010) show that the naive calculation of the permutation *p*
value is

*p = (N / n)*

which leads to an invalid test with incorrect type I error rate. They
go on to show that by replacing the unknown tail probability (the
*p* value) of the Null distribution with the biased estimator

*p = (N + 1 / n + 1)*

that the positive bias induced is of just the right size to account for the uncertainty in the estimation of the tail probability from the set of randomly sampled permutations to yield a test with the correct type I error rate.

The estimator described above is correct for the situation where
permutations of the data are samples randomly *without*
replacement. This is not strictly what happens in vegan because
permutations are drawn pseudo-randomly independent of one
another. Note that the actual chance of this happening is practice is
small but the functions in permute do not guarantee to generate
a unique set of permutations unless complete enumeration of
permutations is requested. This is not feasible for all but the
smallest of data sets or restrictive of permutation designs, but in
such cases the chance of drawing a set of permutations with repeats is
lessened as the sample size, and thence the size of set of all
possible permutations, increases.

Under the situation of sampling permutations with replacement then,
the tail probability *p* calculated from the biased estimator
described above is somewhat **conservative**, being too large by
an amount that depends on the number of possible values that the test
statistic can take under permutation of the data (Phipson & Smyth,
2010). This represents a slight loss of statistical power for the
conservative *p* value calculation used here. However, unless
smaples sizes are small and the the permutation design such that the
set of values that the test statistic can take is also small, this
loss of power is unlikely to be critical.

The minimum achievable p-value is

*p[min] = 1 / (n + 1)*

and hence depends on the number of permutations evaluated. However,
one cannot simply increase the number of permutations (*n*) to
achieve a potentially lower p-value unless the number of observations
available permits such a number of permutations. This is unlikely to
be a problem for all but the smallest data sets when free permutation
(randomisation) is valid, but in restricted permutation designs with a
low number of observations, there may not be as many unique
permutations of the data as you might desire to reach the required
level of significance.

It is currently the responsibility of the user to determine the total
number of possible permutations for their *DATA*. The number of
possible permutations allowed under the specified design can be
calculated using `numPerms`

from the
permute package. Heuristics employed within the
`shuffleSet`

function used by vegan can be
triggered to generate the entire set of permutations instead of a
random set. The settings controlling the triggering of the complete
enumeration step are contained within a permutation design created
using `link[permute]{how}`

and can be set by the user. See
`how`

for details.

Limits on the total number of permutations of *DATA* are more
severe in temporally or spatially ordered data or experimental designs
with low replication. For example, a time series of *n = 100*
observations has just 100 possible permutations **including** the
observed ordering.

In situations where only a low number of permutations is possible due
to the nature of *DATA* or the experimental design, enumeration
of all permutations becomes important and achievable computationally.

Above, we have provided only a brief overview of the capbilities of
vegan and permute. To get the best out of the new
functionality and for details on how to set up permutation designs
using `how`

, consult the vignette
*Restricted permutations; using the permute package* supplied
with permute and accessible via ```
vignette("permutations",
package = "permute").
```

### Author(s)

Gavin L. Simpson

### References

Manly, B. F. J. (2006). *Randomization, Bootstrap and Monte Carlo
Methods in Biology*, Third Edition. Chapman and Hall/CRC.

Phipson, B., & Smyth, G. K. (2010). Permutation P-values should never
be zero: calculating exact P-values when permutations are randomly
drawn. *Statistical Applications in Genetics and Molecular
Biology*, **9**, Article 39. DOI: 10.2202/1544-6115.1585

ter Braak, C. J. F. (1990). *Update notes: CANOCO version
3.1*. Wageningen: Agricultural Mathematics Group. (UR).

See also:

Davison, A. C., & Hinkley, D. V. (1997). *Bootstrap Methods and
their Application*. Cambridge University Press.

### See Also

`permutest`

for the main interface in vegan. See
also `how`

for details on permutation design
specification, `shuffleSet`

for the code used to
generate a set of permutations, `numPerms`

for
a function to return the size of the set of possible permutations
under the current design.