dist.ldc: Dissimilarity matrices for community composition data

Description Usage Arguments Details Value Author(s) References Examples

View source: R/dist.ldc.R


Compute dissimilarity indices for ecological data matrices. The dissimilarity indices computed by this function are those described in Legendre & De Cáceres (2013). In the name of the function, 'ldc' stands for the author's names. Twelve of these 21 indices are not readily available in other R package functions; four of them can, however, be computed in two computation steps in vegan.


dist.ldc(Y, method = "hellinger", binary = FALSE, samp = TRUE,
  silent = FALSE)



Community composition data. The object class can be either data.frame or matrix.


One of the 21 dissimilarity coefficients available in the function: "hellinger", "chord", "log.chord", "chisquare", "profiles", "percentdiff", "ruzicka", "divergence", "canberra", "whittaker", "wishart", "kulczynski", "jaccard", "sorensen", "ochiai", "ab.jaccard", "ab.sorensen", "ab.ochiai", "ab.simpson", "euclidean", "manhattan", "modmeanchardiff". See Details. Names can be abbreviated to a non-ambiguous set of first letters. Default: method="hellinger".


If binary=TRUE, the data are transformed to presence-absence form before computation of the dissimilarities. Default value: binary=FALSE, except for the Jaccard, Sørensen and Ochiai indices where binary=TRUE.


If samp=TRUE, the abundance-based distances (ab.jaccard, ab.sorensen, ab.ochiai, ab.simpson) are computed for sample data. If samp=FALSE, binary indices are computed for true population data.


If silent=FALSE, informative messages sent to users will be printed to the R console. Use silent=TRUE is called on a numerical simulation loop, for example.


The dissimilarities computed by this function are the following. Indices i and k designate two rows (sites) of matrix Y, j designates a column (species). D[ik] is the dissimilarity between rows i and k. p is the number of columns (species) in Y; pp is the number of species present in one or the other site, or in both. y[i+] is the sum of values in row i; same for y[k+]. y[+j] is the sum of values in column j. y[++] is the total sum of values in Y. The indices are computed by functions written in C for greater computation speed with large data matrices.

The properties of all dissimilarities available in this function (except Ružička D) were described and compared in Legendre & De Cáceres (2013), who showed that most of these dissimilarities are appropriate for beta diversity studies. Inappropriate are the Euclidean, Manhattan, modified mean character difference, species profile and chi-square distances. Most of these dissimilarities have a maximum value of either 1 or sqrt(2). Three dissimilarities (Euclidean, Manhattan, Modified mean character difference) do not have an upper bound and are thus inappropriate for beta diversity studies. The chi-square distance has an upper bound of sqrt(2*(sum(Y))).

The Euclidean, Hellinger, chord, chi-square and species profiles dissimilarities have the property of being Euclidean, meaning that they never produce negative eigenvalues in principal coordinate analysis. The Canberra, Whittaker, percentage difference, Wishart and Manhattan coefficients are Euclidean when they are square-root transformed (Legendre & De Cáceres 2013, Table 2). The distance forms (1-S) of the Jaccard, Sørensen and Ochiai similarity (S) coefficients are Euclidean after taking the square root of (1-S) (Legendre & Legendre 2012, Table 7.2). The D matrices resulting from these three coefficients are outputted in the form sqrt(1-S), as in function dist.binary of ade4, because that form is Euclidean and will thus produce no negative eigenvalues in principal coordinate analysis.

The Hellinger, chord, chi-square and species profile dissimilarities are computed using the two-step procedure developed by Legendre & Gallagher (2001). The data are first transformed using either the row marginals, or the row and column marginals in the case of the chi-square distance. The dissimilarities are then computed from the transformed data using the Euclidean distance formula. As a consequence, these four dissimilarities are necessarily Euclidean. D matrices for other binary coefficients can be computed in two ways: either by using function dist.binary of ade4, or by choosing option binary=TRUE, which transforms the abundance data to binary form, and using one of the quantitative indices of the present function. Table 1 of Legendre & De Cáceres (2013) shows the incidence-based (presence-absence-based) indices computed by the various indices using binary data.

The Euclidean distance computed on untransformed presence-absence or abundance data produces non-informative and incorrect ordinations, as shown in Legendre & Legendre (2012, p. 300) and in Legendre & De Cáceres (2013). However, the Euclidean distance computed on log-transformed abundance data produces meaningful ordinations in principal coordinate analysis (PCoA). Nonetheless, it is easier to compute a PCA of log-transformed abundance data instead of a PCoA; the resulting ordination with scaling 1 will be meaningful. Messages are printed to the R console indicating the Euclidean status of the computed dissimilarity matrices. Note that for the chi-square distance, the columns that sum to zero are eliminated before calculation of the distances, thus preventing divisions by zero in the calculation of the chi-square transformation.


A dissimilarity matrix, with class dist.


Pierre Legendre [email protected] and Naima Madi


Chao, A., R. L. Chazdon, R. K. Colwell and T. J. Shen. 2006. Abundance-based similarity indices and their estimation when there are unseen species in samples. Biometrics 62: 361–371.

Legendre, P. and D. Borcard. (Submitted). Box-Cox-chord transformations for community composition data prior to beta diversity analysis.

Legendre, P. and M. De Cáceres. 2013. Beta diversity as the variance of community data: dissimilarity coefficients and partitioning. Ecology Letters 16: 951-963.

Legendre, P. and E. D. Gallagher, E.D. 2001. Ecologically meaningful transformations for ordination of species data. Oecologia 129: 271–280.

Legendre, P. and Legendre, L. 2012. Numerical Ecology. 3rd English edition. Elsevier Science BV, Amsterdam.


if(require("vegan", quietly = TRUE)) {
mat1  = as.matrix(mite[1:10, 1:15])   # No column has a sum of 0
mat2 = as.matrix(mite[61:70, 1:15])   # 7 of the 15 columns have a sum of 0

#Example 1: compute Hellinger distance for mat1
D.out = dist.ldc(mat1,"hellinger")

#Example 2: compute chi-square distance for mat2
D.out = dist.ldc(mat2,"chisquare")

#Example 3: compute percentage difference dissimilarity for mat2
D.out = dist.ldc(mat2,"percentdiff")


adespatial documentation built on Sept. 27, 2018, 5:04 p.m.