In EricMarcon/entropart: Entropy Partitioning to Measure Diversity

set.seed(97310)

entropart is a package for R designed to estimate diversity based on HCDT entropy or similarity-based entropy.

Diversity measurement can be done through a quite rigorous framework based on entropy, i.e. the amount of uncertainty calculated from the frequency distribution of a community [@Patil1982; @Jost2006; @Marcon2014a]. Tsallis entropy, also known as HCDT entropy [@Havrda1967; @Daroczy1970; @Tsallis1988], is of particular interest [@Jost2006; @Marcon2014a] namely because it gathers the number of species, Shannon [-@Shannon1948] and Simpson [-@Simpson1949] indices of diversity into a single framework. Interpretation of entropy is not straightforward but one can easily transform it into Hill numbers [@Hill1973] which have many desirable properties [@Jost2007]: mainly, they are the number of equally-frequent species that would give the same level of diversity as the data.

@Marcon2014b generalized the duality of entropy and diversity, deriving the relation between phylogenetic or functional diversity [@Chao2010] and phylogenetic or functional entropy (we will write phylodiversity and phyloentropy for short), as introduced by @Pavoine2009. Special cases are the well-known PD [@Faith1992] and FD [@Petchey2002] indices and Rao's [-@Rao1982] quadratic entropy. The same relation holds between Ricotta and Szeidl's entropy of a community [@Ricotta2006] and similarity-based diversity [@Leinster2012].

The entropart package for R [@R] enables calculation of all these measures of diversity and entropy and their partitioning.

Diversity partitioning means that, in a given area, the $\gamma$ diversity $D_{\gamma}$ of all individuals found may be split into within ($\alpha$ diversity, $D_{\alpha}$) and between ($\beta$ diversity, $D_{\beta}$) local assemblages. $\alpha$ diversity reflects the diversity of individuals in local assemblages whereas $\beta$ diversity reflects the diversity of the local assemblages. @Marcon2014a derived the decomposition of Tsallis $\gamma$ entropy into its $\alpha$ and $\beta$ components, generalized to phylodiversity [@Marcon2014b] and similarity-based diversity [@Marcon2014e].

Estimators of diversity are biased because of unseen species and also because they are not linear functions of probabilities [@Marcon2014a]. $\alpha$ and $\gamma$ diversities are underestimated by naive estimators [@Chao2003; @Dauby2012]. $\beta$ diversity is severely biased too when sampling is not sufficient [@Beck2013]. Bias-corrected estimators of phylodiversity have been developed by @Marcon2014b. Estimators of similarity-based diversity were derived by @Marcon2014e. The package includes them all [@Marcon2014c].

In summary, the framework supported by the package is as follows. First, an information function is chosen to describe the amount of surprise brought by the observation of each individual. In the simplest case of species-neutral diversity, it is just a decreasing function of probability: observing an individual of a rarer species brings more surprise. Various information functions allow evaluating species-neutral, phylogenetic or functional entropy. Surprise is averaged among all individuals of a community to obtain its entropy. Entropy is systematically transformed into diversity for interpretation. Diversity is an effective number of species, i.e. the number of equally-different and equally-frequent species that would give the same entropy as the data. The average entropy of communities of an assemblage is $\alpha$ entropy, while the entropy of the assemblage is $\gamma$ entropy. Their difference is $\beta$ entropy. After transformation, $\beta$ diversity is the ratio of $\gamma$ to $\alpha$ diversity. It is an effective number of communities, i.e. the number of equally-weighted communities with no species in common (and where species are maximally distinct between communities) necessary to obtain the same diversity as the data. Estimation-bias correction is more easily applied to entropy before transforming it into diversity.

This framework is somehow different from that of @Chao2014a who define $\alpha$ diversity in another way [see @Marcon2014b for a detailed comparison], such that $\alpha$ entropy is not the average surprise of an assemblage. They also propose a definition of functional diversity [@Chiu2014b] based in the information brought by pairs of individuals that is not supported in the package.

The successive sections of this paper presents the package features, illustrated by worked examples based on the data included in the package.

Package organization

Data

Most functions of the package calculate entropy or diversity of a community or of an assemblage of communities called a meta-community. Community functions accept a vector of probabilities or of abundances for species data. Each element of the vector contains the probability or the number of occurrences of a species in a given community. Meta-community functions require a particular data organization in a MetaCommunity object described here.

A MetaCommunity is basically a list. Its main components are $Nsi, a matrix containing the species abundances whose lines are species, columns are communities and $Wi, a vector containing community weights. Creating a MetaCommunity object is the purpose of the MetaCommunity function. Arguments are a dataframe containing the number of individuals per species (lines) in each community (columns), and a vector containing the community weights. The following example creates a MetaCommunity made of three communities of unequal weights with 4 species. The weighted average probabilities of occurrence of species and the total number of individuals define the meta-community as the assemblage of communities.

library("entropart")
df <- data.frame(C1=c(10, 10, 10, 10), C2=c(0, 20, 35, 5), C3=c(25, 15, 0, 2))
row.names(df) <- c("sp1", "sp2", "sp3", "sp4")
df
w <- c(1, 2, 1)
MC <- MetaCommunity(Abundances=df, Weights=w)
plot(MC)

Communities (named C1, C2 ad C3) are represented in the left part of the figure, the metacommunity to the right. Bar widths are proportional to community weights. Species abundances are represented vertically (4 species are present in the meta-community, only 3 of them in communities C2 and C3).

A meta-community is partitioned into several local communities (indexed by $i=1, 2,\dots, I$). $n_i$ individuals are sampled in community $i$. Let $s=1,\ 2,\dots ,S$ denote the species that compose the meta-community, $n_{s,i}$ the number of individuals of species $s$ sampled in the local community $i$, $n_s=\sum_i{n_{s,i}}$ the total number of individuals of species $s$, $n=\sum_s{\sum_i{n_{s,i}}}$ the total number of sampled individuals. Within each community $i$, the probability $p_{s,i}$ for an individual to belong to species $s$ is estimated by $\hat{p}{s,i}=n{s,i}/{n_i}$. The same probability for the meta-community is $p_s$.

Communities have a weight $w_i$, satisfying $p_s=\sum_i{w_i p_{s,i}}$. The commonly-used $w_i=n_i/n$ is a possible weight, but the weighting may be arbitrary (e.g. the sampled areas). The component $Ps of a MetaCommunity object contains the probability of occurrence of each species in the meta-community, calculated this way:

MC$Ps

The number of individuals $Ns of a MetaCommunity is theoretically unknown, since communities are just samples of it. The total number of individuals is $N. For simplicity, it is set to the total number of individuals of all communities. If community weights are their number of individuals, $Ns is just the sum of the numbers of individuals per species of communities. Else, $Ns may contain non-integer values, respecting the probabilities $Ps and summing to $N.

A MetaCommunity can be summarized and plotted.

The package contains an example dataset containing the inventory of two 1-ha tropical forest plots in Paracou, French Guiana [@Marcon2012a]:

data("Paracou618")
summary(Paracou618.MC)

Paracou618.MC is a meta-community made of two communities named P006 and P018, containing r round(Paracou618.MC$Nspecies, 0) species (their name is Family_Genus_Species, abbreviated to 4 characters). The values of the abundance matrix are the number of individuals of each species in each community. Sample coverage will be explained later.

The dataset also contains a taxonomy and a functional tree. Paracou618.Taxonomy is an object of class phylo, defined in the package ape [@Paradis2004], namely a phylogenetic tree. This example data is only a taxonomy, containing family, genus and species levels for the sake of simplicity. Paracou618.Functional is an object of class hclust containing a functional tree based on leaf, height, stem and seed functional traits [@Herault2007; @Marcon2014b]. The package also accepts any ultrametric tree of class phylog, from ade4 [@Dray2007] or hclust. Paracou618.dist is the distance matrix (actually a dist object) used to build the functional tree.

Numeric vectors containing species abundances (such as the $Ns component of MetaCommunity) or probabilities (such as $Ps) may be converted to abundance vectors (AbdVector) or probability vectors (ProbaVector) to clarify their content. By default, the as.AbdVector function transforms abundance values into integer if they are not (the $Ns components of a MetaCommunity is typically not an integer vector if community weights are not proportional to their numbers of individuals):

data("Paracou618")
PAbd <- as.AbdVector(Paracou618.MC$Ns)
plot(PAbd)

the Paracou species distribution is plotted as a Rank-Abundance Curve (Whittaker plot).

The as.ProbaVector function transforms abundances to probabilities if necessary:

PProba <- as.ProbaVector(Paracou618.MC$Ps)

AbdVector and ProbaVector objects both are SpeciesDistribution objects which can be plotted.

Utilities

The deformed logarithm formalism [@Tsallis1994] is very convenient to manipulate entropies. The deformed logarithm of order $q$ is defined as:

$$\ln_q{x}=\frac{x^{1-q}-1}{1-q}$$

It converges to $\ln$ when $q\to 1$.

curve(log(x), 0, 1, lty=1, ylab = expression(ln[q](x)))
curve(lnq(x, 0), 0, 1, lty = 2, add = TRUE)
curve(lnq(x, 2), 0, 1, lty = 3, add = TRUE)
curve(lnq(x, 3), 0, 1, lty = 4, add = TRUE)  
legend("bottomright", legend = c(expression(ln[0](x)), "ln(x)", expression(ln[2](x)), expression(ln[3](x))), lty = c(2, 1, 3, 4), inset=  0.02)

The figure shows the curves of $\ln_q{x}$ for different values of $q$ between 0 and 4 ($\ln_1{x}=\ln{x}$).

The inverse function of $\ln_q{x}$ is the deformed exponential:

$$e^x_q=[1+(1-q)x]^{\frac{1}{1-q}}$$

Functions of the package are lnq(x, q) and expq(x, q).

Species-neutral diversity

Community functions

HCDT entropy

Species-neutral HCDT entropy of order $q$ of a community is defined as:

$$^q!H=\frac{1-\sum_s{p^q_s}}{q-1}=-\sum_s{p^q_s}\ln_q{p_s}=\sum_s{p_s}\ln_q{\frac{1}{p_s}}$$

$q$ is the order of diversity (e.g.: 1 for Shannon). Entropy can be calculated by the Tsallis function. Paracou meta-community entropy of order 1 is:

Tsallis(PProba, q = 1)

For convenience, special cases of entropy of order $q$ have a clear-name function: Richess for $q=0$, Shannon for $q=1$, Simpson for $q=2$.

Shannon(PProba)

Entropy values have no intuitive interpretation in general, except for the number of species $^0!H$ and Simpson entropy $^2!H$ which is the probability for two randomly chosen individuals to belong to different species.

Sample coverage

A useful indicator of sampling quality is the sample coverage [@Good1953; @Chao1988; @Zhang2007], that is to say the probability for a species of the community to be observed in the actual sample. It equals the sum of the probability of occurrences of all observed species. Its historical estimator is [@Good1953]:

$$\hat{C}=1-\frac{S^1}{n}$$

$S^1$ is the number of singletons (species observed once) of the sample, and $n$ is its size. The estimator has been improved by taking into account the whole distribution of species [@Zhang2007]. The Coverage function calculates it, allowing to choose the estimator:

Coverage(PAbd)

The sample coverage cannot be estimated from probability data: abundances are required.

Its interpretation is straightforward: some species have not been sampled. Their number is unknown but their total probability of occurence can be estimated accurately. Here, it is a bit less than 8%. From another point of view, the probability for an individual of the community to belong to a sampled species is $C$: 8% of them belong to missed species. The number of missed species may be estimated by Richness but this is not the point here. The sample coverage is the foundation of many estimators of entropy.

Bias corrected estimators

Correction of estimation bias is used to improve the estimation of entropy despite unobserved species and also mathematical issues [@Bonachela2008]. Bias-corrected estimators (often relying on sample coverage) are returned by functions whose names are prefixed by bc, such as bcTsallis. They are similar to the non-corrected ones but they use abundance data and propose several bias-correction techniques to select in the Correction argument. A Best correction is calculated by default, detailed in the help file of each function.

bcTsallis(PAbd, q = 1)

The best correction for Tsallis entropy follows @Chao2015. It combines an unbiased estimator previously derived by @Zhang2014 and an estimate of the remaining bias.

All community functions such as Tsallis are actually generic methods that can handle several types of data the appropriate way: if the first argument of the function is a ProbaVector (or a numeric vector summing to 1), no bias correction is applied. If it is an AbdVector (or an integer vector), the bias-corrected estimator is used (e.g. bcTsallis). Numeric vectors summing to more than 2 are considered as abundances but most bias corrections do not allow non-integer values and return a warning.

The different ways to use the functions are a matter of personal preference. bcTsallis is equivalent to Tsallis with an abundance vector:

Tsallis(PAbd, q = 1)

whilst Tsallis with a probability vector does not allow bias correction:

Tsallis(PProba, q = 1)

Bias-corrected entropy is ready to be transformed into explicit diversity.

Effective numbers of species

Entropy should be converted into true diversity [@Jost2007], i.e. effective number of species equal to @Hill1973 numbers:

$$^q!D={\left(\sum_s{p^q_s}\right)}^{\frac{1}{1-q}}$$

This can be done by the deformed exponential function, or using directly the Diversity or bcDiversity functions (equal to the deformed exponential of order $q$ of Tsallis or bcTsallis)

expq(Tsallis(PAbd, q = 2), q = 2)
Diversity(PAbd, q = 2)

The effective number of species of the Paracou dataset is estimated to be r round(Diversity(PAbd, q = 2)) after bias correction (rather than r round(Diversity(PProba, q = 2)) without it). It means that a community made of r round(Diversity(Ns = PAbd, q = 2)) equally-frequent species has the same Simpson entropy as the actual one. This is much less than the actual r round(Paracou618.MC$Nspecies, 0) sampled species. Simpson's entropy focuses on dominant species.

Hurlbert's diversity

Hurlbert's index of diversity [@Hurlbert1971] of order $k$ is the expected number of species observed in a sample of size $k$.

$$k!S = \sum{s}{\left[ 1-\left( 1-p_s \right)^k \right]}$$

Greater values of $k$ give more importance to rare species.

An unibiased estimator of $_k!S$ has been provided by Hurlbert, for values of $k$ up to the sample size $n$:

$$k!\hat{S} = \sum{s}{\left[ 1- \binom{n-n_s}{k} / \binom{n}{k} \right]}$$

The effective number of species $_k!D$ can be found by solving the following equation [@Dauby2012]:

$$_k!S = {_k!D} \left[1-{\left(1-\frac{1}{_k!D}\right)}^k\right]$$

Hurlbert's index is calculated by the Hurlbert function. Its unbiased estimator is obtained by bcHurlbert (implicitly if an abundance vector is used). Its effective number of species is caclulated by HurlbertD or bcHurlbertD.

Hurlbert(PProba, k = 2)
Hurlbert(PAbd, k = 2)
HurlbertD(PAbd, k = 2)

Hurlbert's diversity of order 2 is identical to Simpson's diversity.

Meta-community functions

Meta-community functions allow partitioning diversity according to Patil and Taillie's concept of diversity of a mixture [@Patil1982], i.e. $\alpha$ entropy of a meta-community is defined as the weighted average of community entropy, following @Routledge1979:

$$^q!H_{\alpha}=\sum_i w_i \,^q_iH_{\alpha}$$

$^q_iH_{\alpha}$ is the entropy of community $i$:

$$^q_i!H_{\alpha}=\frac{1-\sum_s{p^q_{s,i}}}{q-1} =-\sum_s{p^q_{s,i}}\ln_q{p_{s,i}} =\sum_s{p_{s,i}}\ln_q{\frac{1}{p_{s,i}}}$$

Jost's [-@Jost2007] definition of $\alpha$ entropy is not supported explicitly in the package since it only allows partitioning of equally weighted communities. In this particular case, both definitions are identical.

$\gamma$ entropy of the meta-community is defined as $\alpha$ entropy of a community. $\beta$ entropy, the difference between $\gamma$ and $\alpha$, is the generalized Jensen-Shannon divergence between the species distribution of the meta-community and those of communities [@Marcon2014a]:

$$^q!H_{\beta} =^q!H_{\gamma}-^q!H_{\alpha} =\sum_s{p^q_{s,i}\ln_q\frac{p_{s,i}}{p_s}} =\sum_s{p_{s,i}\ln_q\frac{p_s}{p_{s,i}}}$$

$\beta$ entropy should be transformed into diversity, i.e. an effective number of communities: $$^q!D_{\beta}=e^{\frac{^q!H_{\beta}}{1-(q-1)^q!H_{\alpha}}}_q$$

Basic meta-community functions

These values can be estimated by the meta-community functions named AlphaEntropy, AlphaDiversity, BetaEntropy, BetaDiversity. They accept a Metacommunity and an order of diversity $q$ as arguments, and return an MCentropy or MCdiversity object which can be summarized and plotted. GammaEntropy and GammaDiversity return a number. Estimation-bias corrections are applied by default:

e <- AlphaEntropy(Paracou618.MC, q = 1)
summary(e)

The Shannon $\alpha$ entropy of the meta-community is r round(e$Total, 2). It is the weighted average entropy of communities.

The estimation of the diversity of a meta-community whose numbers of individuals$Ns are not integer values can't be done with most corrections, which do require integers. The Grassberger correction can be used. Community data is pooled to obtain a global inventory whose sample coverage is estimated. The Chao-Shen correction can also be applied based on this sample coverage and the actual $Ps values of the meta-community. Finally, the Best correction is the greater of the two values obtained by Chao-Shen and Grassberger [@Marcon2014a].

Diversity Partition of a metacommunity

The DivPart function calculates everything at once. Its arguments are the same but bias correction is not applied by default. It can be, using the argument Biased = FALSE, and the correction chosen by the argumentCorrection. It returns a DivPart object which can be summarized (entropy is not printed by summary) and plotted:

p <- DivPart(q = 1, MC = Paracou618.MC, Biased = FALSE)
summary(p)
p$CommunityAlphaEntropies
plot(p)

The $\alpha$ diversity of communities is r round(p$TotalAlphaDiversity, 0) effective species (it is the exponential of the entropy calculated previously). This is more than Simpson's diversity r round(bcDiversity(Ns = PAbd, q = 2)) species, calculated above) because less frequent species are taken into account. $\gamma$ diversity of the meta-community is r round(p$GammaDiversity, 0) effective species. $\beta$ diversity is r round(p$TotalBetaDiversity, 2) effective communities, i.e. the two actual communities are as different from each other as r round(p$TotalBetaDiversity, 2) ones with equal weights and no species in common.

The figure is the plot of the diversity partition of the meta-community Paracou618.MC. The long rectangle of height 1 represents $\gamma$ diversity, equal to r round(p$GammaDiversity, 0) effective species. The narrower and higher rectangle has the same area: its horizontal size is $\alpha$ diversity (r round(p$TotalAlphaDiversity, 0) effective species) and its height is $\beta$ diversity (r round(p$TotalBetaDiversity, 2) effective communities).}

Diversity Estimation of a metacommunity

The DivEst function decomposes diversity and estimates confidence interval of $\alpha$, $\beta$ and $\gamma$ diversity following @Marcon2012a. If the observed species frequencies of a community are assumed to be a realization of a multinomial distribution, they can be drawn again to obtain a distribution of entropy.

de <- DivEst(q = 1, Paracou618.MC, Biased = FALSE, Correction = "Best", Simulations = 100)
summary(de)
plot(de)

The result is a Divest object which can be summarized and plotted. On the figure of the diversity estimation of the meta-community Paracou618.MC, $\alpha$, $\beta$ and $\gamma$ diversity probability densities are plotted, with a 95% confidence interval.

The uncertainty of estimation is due to sampling: the distribution of the estimators corresponds to the simulated repetitions of sampling in the original multinomial distribution of species. It ignores the remaining bias of the estimator, which is unknown. Yet, except for $q=2$, the corrected estimators are biased (even though much less than the non-corrected ones), especially when $q$ is small. New estimators to reduce the bias are included in the package regularly.

Diversity Profile of a metacommunity

DivProfile calculates diversity profiles, i.e. the value of diversity against its order. The result is a DivProfile object which can be summarized and plotted.

dp <- DivProfile(seq(0, 2, 0.2), Paracou618.MC, Biased = FALSE, NumberOfSimulations = 20)
summary(dp)
plot(dp)

The figure shows the diversity profile of the meta-community Paracou618.MC. Values are the number of effective species ($\alpha$ and $\gamma$ diversity) and the effective number of communities ($\beta$ diversity). Community P006 is represented by the solid line and community P018 by the dotted line. $\alpha$ and $\gamma$ diversity decrease from $q=0$ (number of species) to $q=2$ (Simpson diversity) by construction.}

Small orders of diversity give more weight to rare species. P018 can be considered more diverse than P006 because their profiles (top right of the figure) do not cross [@Tothmeresz1995]: its diversity is systematically higher. The shape of the $\beta$ diversity profile shows that the communities are more diverse when their dominant species are considered.

The bootstrap confidence intervals of the values of diversity [@Marcon2012a; @Marcon2014a] are calculated if NumberOfSimulations is not 0.

Alternative functions

Beta entropy can also be calculated by a set of functions named after the community functions, such as TsallisBeta, bcTsallisBeta, SimpsonBeta, etc. which require two vectors of abundances or probabilities instead of a MetaCommunity object: that of the community and the expected one (usually that of the meta-community). Bias correction is currently limited to Chao and Shen's correction. The example below calculates the Shannon $\beta$ entropy of the first community of Paracou618 and the meta-community.

ShannonBeta(Paracou618.MC$Psi[, 1], PProba)

These functions are available for particular uses, when a MetaCommunity is not available or not convenient to use (e.g. simulations). Meta-community functions are preferred in general.

Phylogenetic diversity

Phylogenetic or functional diversity generalizes HCDT diversity, considering the distance between species [@Marcon2014b]. Here, all species take place in an ultrametric phylogenetic or functional tree.

Hypothetical ultrametric tree

The tree is cut into slices, delimited by two nodes: the hypothetical tree of the figure (a) contains three slices, delimited by two nodes. The first slice starts at the bottom of the tree and ends at the first node. In slice $k$, $L_k$ leaves are found. The length of slices is $T_k$. The probabilities of occurrence of the species belonging to branches that were below leaf $l$ in the original tree are summed to give the grouped probability $u_{k,l}$. Figure (b) focuses on slice 2. The tree without slice 1 is reduced to 3 leaves. Frequencies of collapsed species are $u_{k,l}$. Figure (c) shows slice 3 only.

HCDT entropy can be calculated in slice $k$:

$$^q_k{H}=-\sum_l{u_{k,l}} \ln_q{(1/u_{k,l})}$$

Then, it is summed over the tree slices. Phyloentropy can be normalized or not. We normalize it so that it does not depend on the tree height:

$$^q\overline{H}\left( T \right) = \sum^K_{k=1}{\frac{T_k}{T}{^q_k{!H}}}$$

Unnormalized values are multiplied by the height of the tree, such as $^q!\mathit{PD}(T)$ [@Chao2010].

Phyloentropy is calculated as HCDT entropy along the slices of the trees applying possible estimation-bias corrections, summed, possibly normalized, and finally transformed into diversity:

$$^q\overline{D}\left( T \right) = e^{^q\overline{H}\left( T \right)}_q$$

Community functions

PhyloEntropy and the estimation-bias-corrected bcPhyloEntropy are the phylogenetic analogs of Tsallis and bcTsallis. They accept the same arguments plus an ultrametric tree of class phylo, hclust or phylog and Normalize a boolean to normalize the tree height to 1 (by default).

Phylogenetic diversity is calculated by PhyloDiversity or bcPhyloDiversity analogous to the species-neutral diversity functions Diversity and bcDiversity.

Results are either a PhyloDiversity or a PhyloEntropy object, which can be plotted and summarized.

phd <- bcPhyloDiversity(PAbd, q = 1, Tree = Paracou618.Taxonomy, Normalize = TRUE)
summary(phd)
plot(phd, main = "")

The figure shows the $\gamma$ phylodiversity estimation of the meta-community Paracou618.MC. The effective number of taxa of Shannon diversity is plotted against the distance from the leaves of the phylogenetic tree. Here, the tree is based on a rough taxonomy, so diversity of species, genera and families are the three levels of the curve. The dotted line represents the value of phylodiversity.

The phylogenetic diversity of order 1 of the Paracou dataset is r round(phd$Total, 0) effective species: r round(phd$Total, 0) totally different species (only connected by the root of the tree) with equal probabilities would have the same entropy. It can be compared to its species-neutral diversity: r round(p$GammaDiversity, 0) species. The latter is the diversity of the first slice of the tree. When going up the tree, diversity decreases because species collapse. On the figure, the diversity of the second slice, between $T=1$ and $T=2$, is that of genera (r round(phd$Cuts[2], 0) effective genera) and the last slice containsr round(phd$Cuts[3], 0) effective families. The phylogenetic entropy of the community is the average of the entropy along slices, weighted by the slice lengths. Diversity can not be averaged the same way.

A less trivial phylogeny would contain many slices, resulting in as many diversity levels with respect to $T$.

The AllenH function is close to PhyloEntropy: it also calculates phyloentropy but the algorithm is that of @Allen2009 for $q=1$ and that of @Leinster2012 for $q \ne 1$. It is much faster since it does not require calculating entropy for each slice of the tree but it does not allow estimation-bias correction. ChaoPD calculates phylodiversity according to @Chao2010, with the same advantages and limits compared to PhyloDiversity.

For convenience, PDFD and Rao functions are provided to calculate unnormalized phyloentropy of order 0 and 2.

Meta-community functions

DivPart, DivEst and DivProfile functions return phylogenetic entropy and diversity values instead of species-neutral ones if a tree is provided in the arguments.

dp <- DivPart(q = 1, Paracou618.MC, Biased = FALSE, Correction = "Best", Tree = Paracou618.Taxonomy)
summary(dp)

The decomposition is interpreted as the species-neutral one: $\gamma$ diversity is r round(dp$GammaDiversity, 0) effective species, made of r round(dp$TotalBetaDiversity, 1) effective communities of r round(dp$TotalAlphaDiversity, 0) effective species.

Other meta-community functions, such as AlphaEntropy behave the same way:

summary(BetaEntropy(Paracou618.MC, q = 2, Tree = Paracou618.Taxonomy, Correction = "None", Normalize = FALSE))

Compare with Rao's divc computed by ade4:

library("ade4")
divc(as.data.frame(Paracou618.MC$Wi), disc(as.data.frame(Paracou618.MC$Nsi), Paracou618.Taxonomy$Wdist))

The decomposition of the diversity of meta-communities with non integer $Ns starts with the estimation of $\gamma$ diversity. The best estimator is found for each slice of the tree. It is then used to estimate $\alpha$ diversity.

Similarity-based diversity

@Leinster2012 introduced similarity-based diversity of a community $^qD^Z$. A matrix $\mathbf{Z}$ describes the similarity between pairs of species, defined between 0 and 1. A species ordinariness is its average similarity with all species (weighted by species frequencies), including similarity with itself (equal to 1). Similarity-based diversity is the reciprocal of the generalized average of order $q$ [@Hardy1952] of the community species ordinariness.

The Dqz function calculates similarity-based diversity. Its arguments are the vector of probabilities of occurrences of the species, the order of diversity and the similarity matrix $\mathbf{Z}$. The bcDqz function allows estimation-bias correction [@Marcon2014e].

This example calculates the $\gamma$ diversity of the meta-community Paracou. First, the similarity matrix is calculated from the distance matrix between all pairs of species as 1 minus normalized dissimilarity.

DistanceMatrix <- as.matrix(Paracou618.dist)
Z <- 1 - DistanceMatrix/max(DistanceMatrix)
bcDqz(PAbd, q = 2, Z)

If $\mathbf{Z}$ is the identity matrix, similarity-based diversity equals HCDT diversity:

Dqz(PProba, q = 2, Z = diag(length(PProba)))
Diversity(PProba, q = 2)

Functional diversity of order 2 is only r round(bcDqz(PAbd, q = 2, Z), 2) effective species, which is very small compared to r round(Diversity(PProba, q = 2), 0) effective species for Simpson diversity. r round(bcDqz(PAbd, q = 2, Z), 2) equally-frequent species with similarity equal to 0 would have the same functional diversity as the actual community (made of r round(Paracou618.MC$Nspecies, 0) species). This means that species are very similar from a functional point of view. The very low values returned by $^qD^Z$ are questioned by @Chao2014a and discussed in depth by @Marcon2014e: the choice of the similarity matrix is not trivial.

The similarity-based entropy of a community $^qH^Z$ [@Leinster2012; Ricotta2006] has the same relations with diversity as HCDT entropy and Hill numbers. The Hqz function calculates it:

Hqz(PProba, q = 2, Z)
lnq(Dqz(PProba, q = 2, Z), q = 2)

As species-neutral entropy, $^qH^Z$ has no straightforward interpretation beyond the average surprise of a community.

All meta-community functions can be used to estimate similarity-based diversity: argument Z must be provided:

e <- AlphaEntropy(Paracou618.MC, q = 1, Z = Z)
summary(e)

The $\alpha$ functional entropy of the meta-community is the average entropy of communities.

Advanced tools

The package comes with a set of tools to realize frequents tasks: run Monte-Carlo simulations on a community, quickly calculate its diversity profile, apply a function to a species distribution along a tree, and manipulate meta-communities.

Random communities

The rCommunity function allows creating random communities. Their species probability distribution can be drawn in a well-known distribution (such as a log-normal one) or obtained from the data, just by dividing abundances by the total number of individuals [@Marcon2012a], or derived from a more sophisticated model by @Chao2015. Finally, the specified number of communities are drawn in a multinomial distribution.

The log-normal [@Preston1948], the log-series [@Fisher1943], the geometric [@Motomura1932], and the broken-stick [@MacArthur1957] distributions can be simulated.

This example code draws a single community of 1000 individuals according to a log-normal distribution with 300 species. Many species are not observed in the 1000-individual sample: the observed number of species is shown, with an estimation of the actual number (which should be 300). The simulated community is plotted (a Whittaker plot), with its log-normal distribution fitted from the data. Estimated parameters can be compared to the original ones.

rCommunity(n = 1, size = 1000, S=300, Distribution = "lnorm", sd=1) -> NsRef
Richness(as.ProbaVector(NsRef))
Richness(NsRef)
plot(NsRef, Distribution="lnorm")

Entropy of Monte-Carlo simulated communities

The EntropyCI function is a versatile tool to simplify simulations. Simulated communities are obtained by random draws in a multinomial distribution of species and their entropy is calculated. The arguments of EntropyCI are an entropy function (any entropy function of the package accepting a vector of species abundances, such as bcTsallis), the number of simulations to run, the observed species frequencies and the method to obtain probabilities for the multinomial distribution (the same as that of rCommunity).

The result is a numeric vector containing the entropy value of each simulated community. Entropy can be finally transformed into diversity (but it is not correct to use a diversity function in simulations because the average simulated value must be calculated and only entropy can be averaged).

This example shows how to use the function. First, the distribution of the $\gamma$ HCDT entropy of order 1 (Shannon entropy) of the Paracou meta-community is calculated and transformed into diversity. Then, the actual diversity is calculated and completed by the 95\% confidence interval of the simulated values.

SimulatedDiversity <- expq(EntropyCI(FUN = Tsallis, Simulations = 100, Ns = PAbd, q = 1), q = 1)
Diversity(PAbd, q = 1)
quantile(SimulatedDiversity, probs = c(0.025, 0.975))

These results are identical to those of the DivEst function but a single community can be addressed (DivEst requires a MetaCommunity).

Diversity or Entropy Profile of a community

This function is used to calculate diversity or entropy profiles based on community functions such as Tsallis or ChaoPD. It is similar to DivProfile but does not require a Metacommunity for argument. It can compute a bootstrap confidence envelope of the estimation of the profile, like EntropyCI. It returns a CommunityProfile object which can be plotted. Profiles can be added to an existing plot by the CEnvelope function.

This example evaluates bias correction on the diversity profile of the Paracou dataset. First, diversity profiles are calculated with and without bias correction. The corrected profile is calculated with its confidence envelope:

bcProfile <- CommunityProfile(Diversity, PAbd, NumberOfSimulations = 100)
Profile <- CommunityProfile(Diversity, PProba)

Then, they can be plotted altogether to obtain the $\gamma$ diversity profile of the the meta-community Paracou618.MC, without bias correction (dotted line) and with correction (solid line):

plot(bcProfile)
CEnvelope(Profile, lty=3)
legend("topright", c("Bias Corrected", "Biased"), lty=c(1,3), inset=0.02)

Applying a Function over a Phylogenetic Tree

The PhyloApply function is used to apply an entropy community function (generally bcTsallis) along a tree, the same way lapply works with a list.

This example shows how to calculate Shannon entropy along the tree containing the taxonomy to obtain species, genus and family entropy:

pa <- PhyloApply(Tree=Paracou618.Taxonomy, FUN=bcTsallis, NorP=PAbd)
summary(pa)
exp(pa$Cuts)
exp(pa$Total)

Manipulation of meta-communities

Several meta-communities, combined in a list, can be merged two different ways. The MergeMC function simplifies hierarchical partitioning of diversity: it considers the aggregated data of each meta-community as a community and builds an upper-level meta-community with them. The $\alpha$ entropy of the new meta-community is the weighted average $\gamma$ entropy of the original meta-communities.

MergeC combines the communities of several meta-communities to create a single meta-community containing them all. Last, ShuffleMC randomly shuffles communities accross meta-communities to allow simulations to test differences between meta-communities.

This example shows how to do this. A first meta-community is created, weights of communities are proportional to their number of individuals:

(df <- data.frame(C1 = c(10, 10, 10, 10), C2 = c(0, 20, 35, 5),
  C3 = c(25, 15, 0, 2), row.names = c("sp1", "sp2", "sp3", "sp4")))
w <- colSums(df)
MC1 <- MetaCommunity(Abundances = df, Weights = w)

Then a second one:

(df <- data.frame(C1 = c(10, 4), C2 = c(3, 4), row.names = c("sp1", "sp5")))
w <- colSums(df)
MC2 <- MetaCommunity(Abundances = df, Weights = w)

They can be merged to obtain a single meta-community containing all original communities:

mergedMC1 <- MergeC(list(MC1, MC2))
mergedMC1$Nsi

They can also be merged considering each of them as a community of a higher-level meta-community:

mergedMC2 <- MergeMC(list(MC1, MC2), Weights = vapply(list(MC1, MC2), function(x) (x$N), FUN.VALUE=0.0))
mergedMC2$Nsi

Hierarchical diversity partitioning can then be achieved:

dpAll <- DivPart(q=1, MC=mergedMC2)
summary(dpAll)

The $\gamma$ diversity of the top assemblage (MC1 and MC2) is r round(dpAll$GammaDiversity, 2) effective species, made of r round(dpAll$TotalBetaDiversity, 2) effective meta-communities of r round(dpAll$TotalAlphaDiversity, 2) effective species. The $\alpha$ diversity of each meta-community of the top assemblage is their $\gamma$ diversity when it is partitioned in turn:

dpMC1 <- DivPart(q=1, MC=MC1)
summary(dpMC1)

The $\gamma$ diversity of MC1 is r round(dpMC1$GammaDiversity, 2) effective species, made of r round(dpMC1$TotalBetaDiversity, 2) effective meta-communities of r round(dpMC1$TotalAlphaDiversity, 2) effective species. The same decomposition can be applied to MC2.

References

EricMarcon/entropart documentation built on Feb. 3, 2025, 1:35 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

EricMarcon/entropart
Entropy Partitioning to Measure Diversity

In EricMarcon/entropart: Entropy Partitioning to Measure Diversity

Package organization

Data

Utilities

Species-neutral diversity

Community functions

HCDT entropy

Sample coverage

Bias corrected estimators

Effective numbers of species

Hurlbert's diversity

Meta-community functions

Basic meta-community functions

Diversity Partition of a metacommunity

Diversity Estimation of a metacommunity

Diversity Profile of a metacommunity

Alternative functions

Phylogenetic diversity

Community functions

Meta-community functions

Similarity-based diversity

Advanced tools

Random communities

Entropy of Monte-Carlo simulated communities

Diversity or Entropy Profile of a community

Applying a Function over a Phylogenetic Tree

Manipulation of meta-communities

References

R Package Documentation

Browse R Packages

We want your feedback!

EricMarcon/entropart Entropy Partitioning to Measure Diversity

In EricMarcon/entropart: Entropy Partitioning to Measure Diversity

Package organization

Data

Utilities

Species-neutral diversity

Community functions

HCDT entropy

Sample coverage

Bias corrected estimators

Effective numbers of species

Hurlbert's diversity

Meta-community functions

Basic meta-community functions

Diversity Partition of a metacommunity

Diversity Estimation of a metacommunity

Diversity Profile of a metacommunity

Alternative functions

Phylogenetic diversity

Community functions

Meta-community functions

Similarity-based diversity

Advanced tools

Random communities

Entropy of Monte-Carlo simulated communities

Diversity or Entropy Profile of a community

Applying a Function over a Phylogenetic Tree

Manipulation of meta-communities

References

R Package Documentation

Browse R Packages

We want your feedback!

EricMarcon/entropart
Entropy Partitioning to Measure Diversity