TBI: TBI: Difference between multivariate observations at T1 and...

View source: R/TBI.R

TBIR Documentation

TBI: Difference between multivariate observations at T1 and T2

Description

The function computes and tests Temporal Beta-diversity Indices (TBI) between multivariate observations (frequency or presence-absence data) forming pairs observed at time 1 (T1) and time 2 (T2). The data matrices may contain abundance or presence-absence data, or other types of frequency-like data (e.g. biomasses). TBI are dissimilarity indices that measure beta differentiation through time. The indices are computed between T1 and T2 for each site. The difference between species (or abundances-per-species) gains (C/den) and losses (B/den) can be printed out and tested for significance.

Usage

TBI(
  mat1,
  mat2,
  method = "%difference",
  pa.tr = FALSE,
  nperm = 99,
  BCD = TRUE,
  replace = FALSE,
  test.BC = TRUE,
  test.t.perm = FALSE,
  save.BC = FALSE,
  seed. = NULL,
  clock = FALSE
)

Arguments

mat1, mat2

Two multivariate community composition or gene frequency data matrices (class data.frame or matrix) with the same number of rows and columns. The rows must correspond to the same objects (e.g. sites) and the columns to the same variables, e.g. species or alleles. – The input files are checked for having equal numbers of rows and columns, for rows that are empty in both mat1 and mat2, and for the presence of negative values, which cannot be frequencies.

method

One of the following dissimilarity coefficients: "%difference" (aka Bray-Curtis), "ruzicka", "chord", "hellinger", "log.chord", "sorensen", "jaccard", "ochiai", "euclidean". See Details. Names can be abbreviated to a non-ambiguous set of first letters. Default: method="%difference".

pa.tr

If pa.tr=TRUE, the data are transformed to binary (i.e. presence-absence, or pa) form. If pa.tr=FALSE, they are not.

nperm

Number of permutations for the tests of significance of the temporal beta indices and the permutation test of the B-C difference. Use nperm = 999 or 9999 in real studies.

BCD

If BCD=TRUE, the B and C components of the percentage difference (method="%difference") and Ruzicka (method"ruzicka") indices are computed and presented in an output table with three columns: B/den, C/den, D=(B+C)/den, where den is the denominator of the index, i.e. (2A+B+C) for the percentage difference index and (A+B+C) for the Ruzicka index. See Details and Value. If pa.tr=TRUE, the B and C components are the numbers of species lost or gained, and D is either the Sorensen or the Jaccard dissimilarity. In the BCD output table, column B contains B/den, C/den, D=(B+C)/den, as in the case of the percentage difference and Ruzicka indices.

If BCD=FALSE, that table is not produced. No table can be computed for indices other than the Ruzicka and percentage difference or their binary forms.

replace

If FALSE (default value), sampling is done without replacement, producing a regular permutation test. If TRUE, sampling is done with replacement for the test of significance; the method is then bootstrapping.

test.BC

If TRUE, the difference between species (or abundances-per-species) gains (C/den) and species (or abundances-per-species) losses (B/den) is tested in a parametric paired t-test computed by function t.test of stats. If FALSE, the test is not computed.

test.t.perm

If TRUE, the difference between species (or abundances-per-species) gains (C/den) and species (or abundances-per-species) losses (B/den) is also tested in a permutational paired t-test computed by function t.paired.perm.If FALSE, the test is not computed.

save.BC

If TRUE, the original B and C values are returned in a matrix called BC, without division by den as in the output matrix BCD.mat. If FALSE, BC will get the value NA in the output list.

seed.

Seed for random number generator. If NULL, the random number generator keeps going from the point it had reached in previous calculations. If seed. is an integer value instead of NULL, the random number generator is reset using that value. This allows users to repeat exactly a previous calculation launched with the same value of seed.; the sequence of generated random numbers will be exactly the same.

clock

If clock=TRUE, the computation time is printed. This option is useful to predict the calculation time when n and nperm are large.

Details

For each object, the function tests the hypothesis (H0) that a species assemblage is not exceptionally different between T1 and T2, compared to assemblages that could have been observed at this site at T1 and T2 under conditions corresponding to H0. If H0 is rejected, the object is recognized as exceptionally different from the other objects for its difference between T1 and T2.

To fix ideas, an example in palaeoecology - A researcher is studying ancient and modern diatom communities in sediment cores. If a site displays an exceptional difference between T1 and T2, the researcher is justified to examine the reason for that difference. It could, for example, be caused by a change in land use at that site, which has caused the difference to be larger than at the other sites, compared to the differences caused by climate change at all sites.

The temporal beta diversity indices available in this function belong to four groups, computed in different ways.

  • Method "%difference" computes the percentage difference index, erroneously called the Bray-Curtis index in some software packages ; it is the quantitative form of the Sorensen index. Method "ruzicka" computes the Ruzicka dissimilarity; this is one of the quantitative coefficients corresponding to the Jaccard dissimilarity for binary data. When these indices are used to compute ordinations by principal coordinate analysis, it is recommended to take the square root of the dissimilarities before the ordination analysis because these indices do not have the property of being Euclidean. However, that precaution is not important here; the results of permutation tests will be the same for these dissimilarities, square-rooted or not. If pa.tr=TRUE, either the Sorensen or the Jaccard coefficient are obtained by computing these two coefficients.

  • Methods "chord" (chord distance), "hellinger" (Hellinger distance) and "log.chord" (log.chord distance) are obtained by transformation of the species data, as described by Legendre & Borcard (2018), followed by calculation of the Euclidean distance. For the Hellinger distance, the data are square-rooted, then subjected to the chord transformation and the Euclidean distance. For the log.chord distance, the data are transformed by y' = log(y+1) using function log1p() of R, then subjected to the chord transformation and the Euclidean distance. These three distances have the Euclidean property (Legendre & Legendre 2012, Legendre & De Caceres 2013). If pa.tr=TRUE, the Ochiai distance for binary data, sqrt(2)*sqrt(1-Ochiai similarity), is obtained from these three coefficients.

  • Methods "jaccard", "sorensen", "ochiai" implement the Jaccard, Sorensen and Ochiai dissimilarities. For these coefficients, the data are first transformed to presence-absence (pa.tr receives the value TRUE), then the dissimilarities are computed using the corresponding quantitative coefficients (Ruzicka, percentage difference, and chord).

  • The Euclidean distance is also available in this function. It is not recommended for community composition or allele frequency data. One can compute it for log-transformed abundance data that do not contain zeros, or very few zeros (short gradients).

The temporal beta indices are tested for significance using permutation tests. The hypotheses are the following:

  • H0: the site under study (e.g. a species assemblage) is not exceptionally different between T1 and T2, compared to assemblages that could have been observed at this site at T1 and T2 under conditions corresponding to H0. The differences between T1 and T2 all belong to the same statistical population of differences.

  • H1: the site under study is exceptionally different between times T1 and T2.

In the decomposition of the Ruzicka and percentage difference dissimilarities or their presence-absence forms (Jaccard, Sorensen), the components B and C are computed as follows:

  • bj is the part of the abundance of species j that is higher at time 1 than at time 2: bj = (y1j - y2j) if y1j > y2j ; bj = 0 otherwise. B is the sum of the bj values for all species in the group of species under study. It is the unscaled sum of species losses between time 1 and time 2. In the BCD output table BCD.mat, column 1 contains B/den where den is the denominator of the index, i.e. (2A+B+C) for the percentage difference index and (A+B+C) for the Ruzicka index.

  • cj is the part of the abundance of species j that is higher at time 2 than at time 1: cj = (y2j - y1j) if y2j > y1j ; cj = 0 otherwise. C is the sum of the cj values for all species in the group of species under study. It is the unscaled sum of species gains between time 1 and time 2. In the BCD output table BCD.mat, column 2 contains C/den where den is the denominator of the index, i.e. (2A+B+C) for the percentage difference index and (A+B+C) for the Ruzicka index.

The original values of B and C for each site, without denominator, are also available in the output table BC.

Warning - In real ecological studies, when the TBI test is applied to data where some sites are highly impoverished due to pollution or other extreme environmental situations, this situation may produce sites with very few species (i.e. very low richness) and no species in common for the T1-T2 comparisons due to sampling variation at these impoverished sites. The TBI dissimilarity will be high and the test may indicate a significant T1-T2 difference if most other sites have higher species richness. This would be a correct statistical outcome for the test. When users of the method identify sites showing significant TBI tests in data, they should check the species richness of these sites at T1 and T2. Interpretation of the test results should be done with caution when high and significant TBI indices are associated with very low richness and no species in common between T1 and T2.

Value

Function TBI returns a list containing the following results:

  • TBI The vector of Temporal Beta-diversity Indices (TBI) between observations at times T1 and T2 for each object.

  • p.TBI A corresponding vector of p-values. Significant p-values (e.g. p.TBI <= 0.05) indicate exceptional objects for the difference of their species composition.

  • p.adj The p-values are corrected for multiple testing using function p.adjust of stats. The adjustment is done using method="holm", which is the default option of the p.adjust function.

  • BCD.mat An output table with four columns: B/den, C/den, D=(B+C)/den, and Change. The value den is the denominator of the index, i.e. (2A+B+C) for the percentage difference index and (A+B+C) for the Ruzicka index. The decomposition is such that D = B/den + C/den. Columns B and C indicate which of the D values are associated with large B (losses) or large C values (gains), before proceeding to the analysis and interpretation of the D values, using environmental or spatial explanatory variables, through regression or classification tree analysis. When B > C, the site has lost species or abundances-per-species between time 1 and time 2; this is indicated by a "-" sign in column Change. On the contrary, if B < C, the site has gained species or abundances-per-species between time 1 and time 2; this is indicated by a "+" sign in that column. Sites with equal amounts of losses and gains are marked with a "0". - The B/den and C/den values can be plotted in B-C plots, which are informative about the changes that occurred in the data set between the two surveys under study. - If pa.tr is TRUE, the B and C components are the numbers of spepcies losses and gains, and D is either the Sorensen or the Jaccard dissimilarity. - If BCD=FALSE, that table is not produced. No table is (or can be) computed for indices other than the Ruzicka and percentage difference indices or their binary forms.

  • BCD.summary An output table with six columns: mean(B/den); mean(C/den); mean(D); B/(B+C) (which is mean(B/den) divided by mean(D)); C/(B+C) (which is mean(C/den) divided by mean(D)). These values indicate, over all sites, which of the processes dominated (loss or gain of species or abundances-per-species) when site compositions changed between time 1 and time 2. Change has the same meaning as in table BCD.mat; the sign indicates the direction of the mean change over all sites.

  • t.test_B.C The results of a paired t-test (parametric) of significance of the difference between columns C/den and B/den of the BCD.mat table. If test.t.perm=TRUE, the difference between species gains (C/den) and losses (B/den) is also tested in a permutational paired t-test and the permutational p-value is shown in the output table. This result provides an overall test of the direction of change over all sites. It helps confirm the asymmetry between species (or abundances-per-species) gains (C/den) and species (or abundances-per-species) losses (B/den). A star in column p<=0.05 indicates a significant result of the parametric test at the 0.05 level.

  • BC An output table with two columns: B and C. In this table, the B and C statistics are not divided by a denominator, contrary to the values B/den and C/den found in the output table BCD.mat.

Author(s)

Pierre Legendre pierre.legendre@umontreal.ca

References

Legendre, P. 2019. A temporal beta-diversity index to identify exceptional sites in space-time surveys. Ecology and Evolution (in press).

Legendre, P. & M. De Caceres. 2013. Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecology Letters 16: 951-963.

Legendre, P. & D. Borcard. 2018. Box-Cox-chord transformations for community composition data prior to beta diversity analysis. Ecography 41: 1820-1824.

Legendre, P. & L. Legendre. 2012. Numerical Ecology. 3rd English edition. Elsevier Science BV, Amsterdam.

van den Brink, P. J. & C. J. F. ter Braak. 1999. Principal response curves: analysis of time-dependent multivariate responses of biological community to stress. Environmental Toxicology and Chemistry 18: 138-148.

See Also

plot.TBI

Examples

if(require("vegan", quietly = TRUE)) {

## Example 1 -

## Invertebrate communities subjected to insecticide treatment.

## As an example in their paper on Principal Response Curves (PRC method), van den
## Brink & ter Braak (1999) used observations on the abundances of 178 invertebrate
## species (macroinvertebrates and zooplankton) subjected to treatments in 12 mesocosms by
## the insecticide chlorpyrifos. The mesocosms were sampled at 11 occasions. The data,
## available in the {vegan} package, are log-transformed species abundances, ytranformed =
## log(10*y+1).

## The data of survey #4 will be compared to those of survey #11 in this example.
## Survey #4 was carried out one week after the insecticide treatment, whereas the fauna
## of the mesocosms was considered by the authors to have fully recovered from the
## insecticide treatment at survey #11.

data(pyrifos)

## The mesocosms had originally been attributed at random to the treatments. However,
## to facilitate presentation of the results, they will be listed here in order of
## increased insecticide doses: {0, 0, 0, 0, 0.1, 0.1, 0.9, 0.9, 6, 6, 44, 44} micro g/L.

## Select the 12 data rows of surveys 4 and 11 from the data file and reorder them

ord4 = c(38,39,41,47,37,44,40,46,43,48,42,45)

ord11 = c(122,123,125,131,121,128,124,130,127,132,126,129)

## Run the TBI function

res1 <- TBI(pyrifos[ord4,], pyrifos[ord11,], method = "%diff", nperm = 0, test.t.perm = FALSE)

res1$BCD.mat

## Example 2 -

## This example uses the mite data available in vegan. Let us pretend that sites 1-20
## represent T1 and sites 21-40 represent T2.


data(mite)

# Run the TBI function

res2 <- TBI(mite[1:20,], mite[21:40,], method = "%diff", nperm = 0, test.t.perm = FALSE)

summary(res2)

res2$BCD.mat

}

adespatial documentation built on Sept. 11, 2024, 7:04 p.m.