dbFD: Distance-Based Functional Diversity Indices


dbFD implements a flexible distance-based framework to compute multidimensional functional diversity (FD) indices. dbFD returns the three FD indices of Villéger et al. (2008): functional richness (FRic), functional evenness (FEve), and functional divergence (FDiv), as well functional dispersion (FDis; Laliberté and Legendre 2010), Rao's quadratic entropy (Q) (Botta-Dukát 2005), a posteriori functional group richness (FGR) (Petchey and Gaston 2006), and the community-level weighted means of trait values (CWM; e.g. Lavorel et al. 2008). Some of these FD indices consider species abundances. dbFD includes several options for flexibility.


dbFD(x, a, w, w.abun = TRUE, stand.x = TRUE,
    ord = c("podani", "metric"), asym.bin = NULL,
    corr = c("sqrt", "cailliez", "lingoes", "none"),
    calc.FRic = TRUE, m = "max", stand.FRic = FALSE,
    scale.RaoQ = FALSE, calc.FGR = FALSE, clust.type = "ward",
    km.inf.gr = 2, km.sup.gr = nrow(x) - 1, km.iter = 100,
    km.crit = c("calinski", "ssi"), calc.CWM = TRUE,
    CWM.type = c("dom", "all"), calc.FDiv = TRUE, dist.bin = 2, 
    print.pco = FALSE, messages = TRUE)



matrix or data frame of functional traits. Traits can be numeric, ordered, or factor. Binary traits should be numeric and only contain 0 and 1. character traits will be converted to factor. NAs are tolerated.

x can also be a species-by-species distance matrix of class dist, in which case NAs are not allowed.

When there is only one trait, x can be also be a numeric vector, an ordered factor, or a unordered factor.

In all cases, species labels are required.



matrix containing the abundances of the species in x (or presence-absence, i.e. 0 or 1). Rows are sites and species are columns. Can be missing, in which case dbFD assumes that there is only one community with equal abundances of all species. NAs will be replaced by 0. The number of species (columns) in a must match the number of species (rows) in x. In addition, the species labels in a and x must be identical and in the same order.


vector listing the weights for the traits in x. Can be missing, in which case all traits have equal weights.


logical; should FDis, Rao's Q, FEve, FDiv, and CWM be weighted by the relative abundances of the species?


logical; if all traits are numeric, should they be standardized to mean 0 and unit variance? If not all traits are numeric, Gower's (1971) standardization by the range is automatically used; see gowdis for more details.


character string specifying the method to be used for ordinal traits (i.e. ordered). "podani" refers to Eqs. 2a-b of Podani (1999), while "metric" refers to his Eq. 3. Can be abbreviated. See gowdis for more details.


vector listing the asymmetric binary variables in x. See gowdis for more details.


character string specifying the correction method to use when the species-by-species distance matrix cannot be represented in a Euclidean space. Options are "sqrt", "cailliez", "lingoes", or "none". Can be abbreviated. Default is "sqrt". See ‘details’ section.


logical; should FRic be computed?


the number of PCoA axes to keep as ‘traits’ for calculating FRic (when FRic is measured as the convex hull volume) and FDiv. Options are: any integer >1, "min" (maximum number of traits that allows the s >= 2^t condition to be met, where s is the number of species and t the number of traits), or "max" (maximum number of axes that allows the s > t condition to be met). See ‘details’ section.


logical; should FRic be standardized by the ‘global’ FRic that include all species, so that FRic is constrained between 0 and 1?


logical; should Rao's Q be scaled by its maximal value over all frequency distributions? See divc.


logical; should FGR be computed?


character string specifying the clustering method to be used to create the dendrogram of species for FGR. Options are "ward", "single", "complete", "average", "mcquitty", "median", "centroid", and "kmeans". For "kmeans", other arguments also apply (km.inf.fr, km.sup.gr, km.iter, and km.crit). See hclust and cascadeKM for more details.


the number of groups for the partition with the smallest number of groups of the cascade (min). Only applies if calc.FGR is TRUE and clust.type is "kmeans". See cascadeKM for more details.


the number of groups for the partition with the largest number of groups of the cascade (max). Only applies if calc.FGR is TRUE and clust.type is "kmeans". See cascadeKM for more details.


the number of random starting configurations for each value of K. Only applies if calc.FGR is TRUE and clust.type is "kmeans". See cascadeKM for more details.


criterion used to select the best partition. The default value is "calinski" (Calinski-Harabasz 1974). The simple structure index "ssi" is also available. Only applies if calc.FGR is TRUE and clust.type is "kmeans". Can be abbreviated. See cascadeKM for more details.


logical; should the community-level weighted means of trait values (CWM) be calculated? Can be abbreviated. See functcomp for more details.


character string indicating how nominal, binary and ordinal traits should be handled for CWM. See functcomp for more details.


logical; should FDiv be computed?


only applies when x is a single unordered factor, in which case x is coded using dummy variables. dist.bin is an integer between 1 and 10 specifying the appropriate distance measure for binary data. 2 (the default) refers to the simple matching coefficient (Sokal and Michener 1958). See dist.binary for the other options.


logical; should the eigenvalues and PCoA axes be returned?


logical; should warning messages be printed in the console?


Typical usage is

dbFD(x, a, \dots)

If x is a matrix or a data frame that contains only continuous traits, no NAs, and that no weights are specified (i.e. w is missing), a species-species Euclidean distance matrix is computed via dist. Otherwise, a Gower dissimilarity matrix is computed via gowdis. If x is a distance matrix, it is taken as is.

When x is a single trait, species with NAs are first excluded to avoid NAs in the distance matrix. If x is a single continuous trait (i.e. of class numeric), a species-species Euclidean distance matrix is computed via dist. If x is a single ordinal trait (i.e. of class ordered), gowdis is used and argument ord applies. If x is a single nominal trait (i.e. an unordered factor), the trait is converted to dummy variables and a distance matrix is computed via dist.binary, following argument dist.bin.

Once the species-species distance matrix is obtained, dbFD checks whether it is Euclidean. This is done via is.euclid. PCoA axes corresponding to negative eigenvalues are imaginary axes that cannot be represented in a Euclidean space, but simply ignoring these axes would lead to biased estimations of FD. Hence in dbFD one of four correction methods are used, following argument corr. "sqrt" simply takes the square root of the distances. However, this approach does not always work for all coefficients, in which case dbFD will stop and tell the user to select another correction method. "cailliez" refers to the approach described by Cailliez (1983) and is implemented via cailliez. "lingoes" refers to the approach described by Lingoes (1971) and is implemented via lingoes. "none" creates a distance matrix with only the positive eigenvalues of the Euclidean representation via quasieuclid. See Legendre and Legendre (1998) and Legendre and Anderson (1999) for more details on these corrections.

Principal coordinates analysis (PCoA) is then performed (via dudi.pco) on the corrected species-species distance matrix. The resulting PCoA axes are used as the new ‘traits’ to compute the three indices of Villéger et al. (2008): FRic, FEve, and FDiv. For FEve, there is no limit on the number of traits that can be used, so all PCoA axes are used. On the other hand, FRic and FDiv both rely on finding the minimum convex hull that includes all species (Villéger et al. 2008). This requires more species than traits. To circumvent this problem, dbFD takes only a subset of the PCoA axes as traits via argument m. This, however, comes at a cost of loss of information. The quality of the resulting reduced-space representation is returned by qual.FRic, which is computed as described by Legendre and Legendre (1998) and can be interpreted as a R^2-like ratio.

In dbFD, FRic is generally measured as the convex hull volume, but when there is only one continuous trait it is measured as the range (or the range of the ranks for an ordinal trait). Conversely, when only nominal and ordinal traits are present, FRic is measured as the number of unique trait value combinations in a community. FEve and FDiv, but not FRic, can account for species relative abundances, as described by Villéger et al. (2008).

Functional dispersion (FDis; Laliberté and Legendre 2010) is computed from the uncorrected species-species distance matrix via fdisp. Axes with negatives eigenvalues are corrected following the approach of Anderson (2006). When all species have equal abundances (i.e. presence-absence data), FDis is simply the average distance to the centroid (i.e. multivariate dispersion) as originally described by Anderson (2006). Multivariate dispersion has been proposed as an index of beta diversity (Anderson et al. 2006). However, Laliberté and Legendre (2010) have extended it to a FD index. FDis can account for relative abundances by shifting the position of the centroid towards the most abundant species, and then computing a weighted average distance to this new centroid, using again the relative abundances as weights (Laliberté and Legendre 2010). FDis has no upper limit and requires at least two species to be computed. For communities composed of only one species, dbFD returns a FDis value of 0. FDis is by construction unaffected by species richness, it can be computed from any distance or dissimilarity measure (Anderson et al. 2006), it can handle any number and type of traits (including more traits than species), and it is not strongly influenced by outliers.

Rao's quadratic entropy (Q) is computed from the uncorrected species-species distance matrix via divc. See Botta-Dukát (2005) for details. Rao's Q is conceptually similar to FDis, and simulations (via simul.dbFD) have shown high positive correlations between the two indices (Laliberté and Legendre 2010). Still, one potential advantage of FDis over Rao's Q is that in the unweighted case (i.e. with presence-absence data), it opens possibilities for formal statistical tests for differences in FD between two or more communities through a distance-based test for homogeneity of multivariate dispersions (Anderson 2006); see betadisper for more details.

Functional group richness (FGR) is based on the classification of the species by the user from visual inspection of a dengrogram. Method "kmeans" is also available by calling cascadeKM. In that case, the Calinski-Harabasz (1974) criterion or the simple structure index (SSI) can be used to estimate the number of functional groups; see cascadeKM for more details. FGR returns the number of functional groups per community, as well as the abundance of each group in each community.

The community-level means of trait values (CWM) is an index of functional composition (Lavorel et al. 2008), and is computed via functcomp. Species with NAs for a given trait are excluded for that trait.



vector listing the number of species in each community


vector listing the number of functionally singular species in each community. If all species are functionally different, sing.sp will be identical to nbsp.


vector listing the FRic of each community


quality of the reduced-space representation required to compute FRic and FDiv.


vector listing the FEve of each community


vector listing the FDiv of each community. Only returned if calc.FDiv is TRUE.


vector listing the FDis of each community


vector listing the Rao's quadratic entropy (Q) of each community


vector listing the FGR of each community. Only returned if calc.FGR is TRUE.


vector specifying functional group membership for each species. Only returned if calc.FGR is TRUE.


matrix containing the abundances of each functional group in each community. Only returned if calc.FGR is TRUE.


data frame containing the community-level weighted trait means (CWM). Only returned if calc.CWM is TRUE.


eigenvalues from the PCoA. Only returned if print.pco is TRUE.


PCoA axes. Only returned if print.pco is TRUE.


Users often report that dbFD crashed during their analysis. Generally this occurs under Windows, and is almost always due to the computation of convex hull volumes. Possible solutions are to choose calc.FRic = "FALSE", or to reduce the dimensionality of the trait matrix using the "m" argument.


dbFD borrows code from the F_RED function of Villéger et al. (2008).


Etienne Laliberté etiennelaliberte@gmail.com http://www.elaliberte.info


Anderson, M. J. (2006) Distance-based tests for homogeneity of multivariate dispersions. Biometrics 62:245-253.

Anderson, M. J., K. E. Ellingsen and B. H. McArdle (2006) Multivariate dispersion as a measure of beta diversity. Ecology Letters 9:683-693.

Botta-Dukát, Z. (2005) Rao's quadratic entropy as a measure of functional diversity based on multiple traits. Journal of Vegetation Science 16:533-540.

Cailliez, F. (1983) The analytical solution of the additive constant problem. Psychometrika 48:305-310.

Calinski, T. and J. Harabasz (1974) A dendrite method for cluster analysis. Communications in Statistics 3:1-27.

Gower, J. C. (1971) A general coefficient of similarity and some of its properties. Biometrics 27:857-871.

Laliberté, E. and P. Legendre (2010) A distance-based framework for measuring functional diversity from multiple traits. Ecology 91:299-305.

Lavorel, S., K. Grigulis, S. McIntyre, N. S. G. Williams, D. Garden, J. Dorrough, S. Berman, F. Quétier, A. Thebault and A. Bonis (2008) Assessing functional diversity in the field - methodology matters! Functional Ecology 22:134-147.

Legendre, P. and M. J. Anderson (1999) Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecological Monographs 69:1-24.

Legendre, P. and L. Legendre (1998) Numerical Ecology. 2nd English edition. Amsterdam: Elsevier.

Lingoes, J. C. (1971) Some boundary conditions for a monotone analysis of symmetric matrices. Psychometrika 36:195-203.

Podani, J. (1999) Extending Gower's general coefficient of similarity to ordinal characters. Taxon 48:331-340.

Sokal, R. R. and C. D. Michener (1958) A statistical method for evaluating systematic relationships. The University of Kansas Scientific Bulletin 38:1409-1438.

Villéger, S., N. W. H. Mason and D. Mouillot (2008) New multidimensional functional diversity indices for a multifaceted framework in functional ecology. Ecology 89:2290-2301.

See Also

gowdis, functcomp, fdisp, simul.dbFD, divc, treedive, betadisper


# mixed trait types, NA's
ex1 <- dbFD(dummy$trait, dummy$abun)

# add variable weights
# 'cailliez' correction is used because 'sqrt' does not work
w<-c(1, 5, 3, 2, 5, 2, 6, 1)
ex2 <- dbFD(dummy$trait, dummy$abun, w, corr="cailliez")

# if 'x' is a distance matrix
trait.d <- gowdis(dummy$trait)
ex3 <- dbFD(trait.d, dummy$abun)

# one numeric trait, one NA
num1 <- dummy$trait[,1] ; names(num1) <- rownames(dummy$trait)
ex4 <- dbFD(num1, dummy$abun)

# one ordered trait, one NA
ord1 <- dummy$trait[,5] ; names(ord1) <- rownames(dummy$trait)
ex5 <- dbFD(ord1, dummy$abun)

# one nominal trait, one NA
fac1 <- dummy$trait[,3] ; names(fac1) <- rownames(dummy$trait)
ex6 <- dbFD(fac1, dummy$abun)

# example with real data from New Zealand short-tussock grasslands
# 'lingoes' correction used because 'sqrt' does not work in that case
ex7 <- dbFD(tussock$trait, tussock$abun, corr = "lingoes")

## Not run: 
# calc.FGR = T, 'ward'
ex7 <- dbFD(dummy$trait, dummy$abun, calc.FGR = T)

# calc.FGR = T, 'kmeans'
ex8 <- dbFD(dummy$trait, dummy$abun, calc.FGR = T,
clust.type = "kmeans")

# ward clustering to compute FGR
ex9 <- dbFD(tussock$trait, tussock$abun,
corr = "cailliez", calc.FGR = TRUE, clust.type = "ward")
# choose 'g' for number of groups
# 6 groups seems to make good ecological sense

# however, calinksi criterion in 'kmeans' suggests
# that 6 groups may not be optimal
ex10 <- dbFD(tussock$trait, tussock$abun, corr = "cailliez",
calc.FGR = TRUE, clust.type = "kmeans", km.sup.gr = 10)

## End(Not run)

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.