Diversity Measures

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
# Load packages
library(tabula)
library(folio) # Datasets
library(magrittr)

Thereafter, we denote by:

Diversity in ecology describes complex interspecific interactions between and within communities under a variety of environmental conditions [@bobrowsky1989]. This concept covers different components, allowing different aspects of interspecific interactions to be measured.

$\alpha$-diversity

Richness and Rarefaction

The number of different taxa, provides an instantly comprehensible expression of diversity. While the number of taxa within a sample is easy to ascertain, as a term, it makes little sense: some taxa may not have been seen, or there may not be a fixed number of taxa [e.g. in an open system; @peet1974]. As an alternative, richness ($R$) can be used for the concept of taxa number [@mcintosh1967]. Richness refers to the variety of taxa/species/types present in an assemblage or community [@bobrowsky1989] as "the number of species present in a collection containing a specified number of individuals" [@hurlbert1971].

It is not always possible to ensure that all sample sizes are equal and the number of different taxa increases with sample size and sampling effort [@magurran1988]. Then, rarefaction ($\hat{S}$) is the number of taxa expected if all samples were of a standard size $n$ (i.e. taxa per fixed number of individuals). Rarefaction assumes that imbalances between taxa are due to sampling and not to differences in actual abundances.

Definitions

| Measure | Reference | |:----------------------------------|:-----------------| | $$ R_{1} = \frac{S - 1}{\ln N} $$ | @margalef1958 * | | $$ R_{2} = \frac{S}{\log N} $$ | @odum1960 | | $$ R_{3} = \frac{S}{\sqrt{N}} $$ | @menhinick1964 * | | $$ R_{4} = \frac{S}{\log A} $$ | @gleason1922 | Table: Richness measures [@bobrowsky1989]. *: implemented.

| Measure | Reference | |:----------------------------------|:---------------| | $$ \hat{S}{1} = \alpha - \left[ \ln \left( 1 - x \right) \right] $$ | @fisher1943 | | $$ \hat{S}{2} = y_{0} \hat{\sigma} \sqrt{2 \pi} $$ | @preston1948 | | $$ \hat{S}{3} = 2.07 \left( \frac{N}{m} \right)^{0.262} $$ | @preston1962, @preston1962a | | $$ \hat{S}{4} = 2.07 \left( \frac{N}{m} \right)^{0.262} A^{0.262} $$ | @macarthur1965 | | $$ \hat{S}{5} = k A^{d} $$ | @kilburn1966 | | $$ \hat{S}{6} = \frac{a N}{1 + b N} $$ | @decaprariis1976 | | $$ \hat{S}{7} = \sum{i = 1}^{S} 1 - \frac{{N - N_i} \choose n}{N \choose n} $$ | @hurlbert1971, @sander1968 * | Table: Rarefaction measures [@bobrowsky1989]. *: implemented.

Where:

Chao1 estimator [@chao1984]

$$ \hat{S}_{Chao1} = \begin{cases} S + \frac{N - 1}{N} \frac{s_1^2}{2 s_2} & s_2 > 0 \ S + \frac{N - 1}{N} \frac{s_1 (s_1 - 1)}{2} & s_2 = 0 \end{cases} $$

In the special case of homogeneous case, a bias-corrected estimator is:

$$ \hat{S}_{bcChao1} = S + \frac{N - 1}{N} \frac{s_1 (s_1 - 1)}{2 s_2 + 1}$$

The improved Chao1 estimator makes use of the additional information of tripletons and quadrupletons [@chiu2014]:

$$ \hat{S}{iChao1} = \hat{S}{Chao1} + \frac{N - 3}{4 N} \frac{s_3}{s_4} \times \max\left(s_1 - \frac{N - 3}{N - 1} \frac{s_2 s_3}{2 s_4} , 0\right)$$

Abundance-based Coverage Estimator [@chao1992]

$$ \hat{S}{ACE} = \hat{S}{abun} + \frac{\hat{S}{rare}}{\hat{C}{rare}} + \frac{s_1}{\hat{C}{rare}} \times \hat{\gamma}^2{rare} $$

Where $\hat{S}{rare} = \sum{i = 1}^{k} s_i$ is the number of rare taxa, $\hat{S}{abun} = \sum{i > k}^{N} s_i$ is the number of abundant taxa (for a given cut-off value $k$), $\hat{C}{rare} = 1 - \frac{s_1}{N{rare}}$ is the Turing's coverage estimate and:

$$ \hat{\gamma}^2_{rare} = \max\left[\frac{\hat{S}{rare}}{\hat{C}{rare}} \frac{\sum_{i = 1}^{k} i(i - 1)s_i}{\left(\sum_{i = 1}^{k} is_i\right)\left(\sum_{i = 1}^{k} is_i - 1\right)} - 1, 0\right] $$

Chao2 estimator [@chao1987]

For replicated incidence data (i.e. a $m \times p$ logical matrix), the Chao2 estimator is:

$$ \hat{S}_{Chao2} = \begin{cases} S + \frac{m - 1}{m} \frac{q_1^2}{2 q_2} & q_2 > 0 \ S + \frac{m - 1}{m} \frac{q_1 (q_1 - 1)}{2} & q_2 = 0 \end{cases} $$

Improved Chao2 estimator [@chiu2014]:

$$ \hat{S}{iChao2} = \hat{S}{Chao2} + \frac{m - 3}{4 m} \frac{q_3}{q_4} \times \max\left(q_1 - \frac{m - 3}{m - 1} \frac{q_2 q_3}{2 q_4} , 0\right)$$

Incidence-based Coverage Estimator [@chao2016]

$$ \hat{S}{ICE} = \hat{S}{freq} + \frac{\hat{S}{infreq}}{\hat{C}{infreq}} + \frac{q_1}{\hat{C}{infreq}} \times \hat{\gamma}^2{infreq} $$

Where $\hat{S}{infreq} = \sum{i = 1}^{k} q_i$ is the number of infrequent taxa, $\hat{S}{freq} = \sum{i > k}^{N} q_i$ is the number of frequent taxa (for a given cut-off value $k$), $\hat{C}{infreq} = 1 - \frac{Q_1}{\sum{i = 1}^{k} iq_i}$ is the Turing's coverage estimate and:

$$ \hat{\gamma}^2_{infreq} = \max\left[\frac{\hat{S}{infreq}}{\hat{C}{infreq}} \frac{m_{infreq}}{m_{infreq} - 1} \frac{\sum_{i = 1}^{k} i(i - 1)q_i}{\left(\sum_{i = 1}^{k} iq_i\right)\left(\sum_{i = 1}^{k} iq_i - 1\right)} - 1, 0\right] $$

Where $m_{infreq}$ is the number of sampling units that include at least one infrequent species.

Usage

mississippi %>%
  as_count() %>%
  index_richness(method = "margalef")
mississippi %>%
  as_count() %>%
  index_composition(method = "chao1")
mississippi %>%
  as_count() %>%
  rarefaction(sample = 10, method = "hurlbert", simplify = TRUE) %>%
  head()

Heterogeneity and Evenness

Definitions

Diversity measurement assumes that all individuals in a specific taxa are equivalent and that all types are equally different from each other [@peet1974]. A measure of diversity can be achieved by using indices built on the relative abundance of taxa. These indices (sometimes referred to as non-parametric indices) benefit from not making assumptions about the underlying distribution of taxa abundance: they only take relative abundances of the species that are present and species richness into account. @peet1974 refers to them as indices of heterogeneity ($H$).

Diversity indices focus on one aspect of the taxa abundance and emphasize either richness (weighting towards uncommon taxa) or dominance [weighting towards abundant taxa; @magurran1988].

Evenness ($E$) is a measure of how evenly individuals are distributed across the sample.

Information theory index

Shannon-Wiener diversity index

The Shannon-Wiener index [@shannon1948] assumes that individuals are randomly sampled from an infinite population and that all taxa are represented in the sample (it does not reflect the sample size). The main source of error arises from the failure to include all taxa in the sample: this error increases as the proportion of species discovered in the sample declines [@peet1974; @magurran1988]. The maximum likelihood estimator (MLE) is used for the relative abundance, this is known to be negatively biased by sample size.

Heterogeneity for an infinite sample:

$$ H' = - \sum_{i = 1}^{S} p_i \ln p_i $$

Heterogeneity for a finite sample:

$$ H' = - \sum_{i = 1}^{S} \frac{n_i}{N} \ln \frac{n_i}{N} $$

Evenness:

$$ E = \frac{H}{H_{max}} = \frac{H'}{\ln S} = - \sum_{i = 1}^{S} p_i \log_S p_i $$

When $p_i$ is unknown in the population, an estimate is given by $\hat{p}_i =\frac{n_i}{N}$ (maximum likelihood estimator - MLE). As the use of $\hat{p}_i$ results in a biased estimate, @hutcheson1970 and @bowman1971 suggest the use of:

$$ \hat{H}' = - \sum_{i = 1}^{S} \hat{p}i \ln \hat{p}_i - \frac{S - 1}{N} + \frac{1 - \sum{i = 1}^{S} \hat{p}i^{-1}}{12N^2} + \frac{\sum{i = 1}^{S} (\hat{p}_i^{-1} - \hat{p}_i^{-2})}{12N^3} + \cdots $$

This error is rarely significant [@peet1974], so the unbiased form is not implemented here (for now).

Brillouin diversity index

The Brillouin index [@brillouin1956] describes a known collection: it does not assume random sampling in an infinite population. @pielou1975 and @laxton1978 argues for the use of the Brillouin index in all circumstances, especially in preference to the Shannon index.

Diversity:

$$ H' = \frac{\ln (N!) - \sum_{i = 1}^{S} \ln (n_i!)}{N} $$

Evenness:

$$ E = \frac{H'}{H'_{max}} $$

with:

$$ H'_{max} = \frac{1}{N} \ln \frac{N!}{\left( \lfloor \frac{N}{S} \rfloor! \right)^{S - r} \left[ \left( \lfloor \frac{N}{S} \rfloor + 1 \right)! \right]^{r}} $$

where: $r = N - S \lfloor \frac{N}{S} \rfloor$.

Dominance index

The following methods return a dominance index, not the reciprocal or inverse form usually adopted, so that an increase in the value of the index accompanies a decrease in diversity.

Simpson index

The Simpson index [@simpson1949] expresses the probability that two individuals randomly picked from a finite sample belong to two different types. It can be interpreted as the weighted mean of the proportional abundances. This metric is a true probability value, it ranges from $0$ (perfectly uneven) to $1$ (perfectly even).

Dominance for an infinite sample:

$$ D = \sum_{i = 1}^{S} p_i^2 $$

Dominance for a finite sample:

$$ D = \sum_{i = 1}^{S} \frac{n_i \left( n_i - 1 \right)}{N \left( N - 1 \right)} $$

McIntosh index

The McIntosh index [@mcintosh1967] expresses the heterogeneity of a sample in geometric terms. It describes the sample as a point of a $S$-dimensional hypervolume and uses the Euclidean distance of this point from the origin.

Dominance:

$$ D = \frac{N - U}{N - \sqrt{N}} $$

Evenness:

$$ E = \frac{N - U}{N - \frac{N}{\sqrt{S}}} $$

where $U$ is the distance of the sample from the origin in an $S$ dimensional hypervolume:

$$U = \sqrt{\sum_{i = 1}^{S} n_i^2}$$

Berger-Parker index

The Berger-Parker index [@berger1970] expresses the proportional importance of the most abundant type. This metric is highly biased by sample size and richness, moreover it does not make use of all the information available from sample.

Dominance:

$$ D = \frac{n_{max}}{N} $$

Usage

mississippi %>%
  as_count() %>%
  index_heterogeneity(method = "shannon")

Note that berger, mcintosh and simpson methods return a dominance index, not the reciprocal form usually adopted, so that an increase in the value of the index accompanies a decrease in diversity.

Corresponding evenness can also be computed :

mississippi %>%
  as_count() %>%
  index_evenness(method = "shannon")
mississippi %>%
  as_count() %>%
  jackknife_evenness(method = "shannon")

$\beta$-diversity

Turnover

Definitions

The following methods can be used to ascertain the degree of turnover in taxa composition along a gradient on qualitative (presence/absence) data. This assumes that the order of the matrix rows (from 1 to $m$) follows the progression along the gradient/transect.

We denote the $m \times p$ incidence matrix by $X = \left[ x_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]$ and the $p \times p$ corresponding co-occurrence matrix by $Y = \left[ y_{ij} \right] ~\forall i,j \in \left[ 1,p \right]$, with row and column sums:

\begin{align} x_{i \cdot} = \sum_{j = 1}^{p} x_{ij} && x_{\cdot j} = \sum_{i = 1}^{m} x_{ij} && x_{\cdot \cdot} = \sum_{j = 1}^{p} \sum_{i = 1}^{m} x_{ij} && \forall x_{ij} \in \lbrace 0,1 \rbrace \

y_{i \cdot} = \sum_{j \geqslant i}^{p} y_{ij} && y_{\cdot j} = \sum_{i \leqslant j}^{p} y_{ij} && y_{\cdot \cdot} = \sum_{i = 1}^{p} \sum_{j \geqslant i}^{p} y_{ij} && \forall y_{ij} \in \lbrace 0,1 \rbrace \end{align}

| Measure | Reference | |:----------------------------------|:-----------------| | $$ \beta_W = \frac{S}{\alpha} - 1 $$ | @whittaker1960 * | | $$ \beta_C = \frac{g(H) + l(H)}{2} - 1 $$ | @cody1975 * | | $$ \beta_R = \frac{S^2}{2 y_{\cdot \cdot} + S} - 1 $$ | @routledge1977 * | | $$ \beta_I = \log x_{\cdot \cdot} - \frac{\sum_{j = 1}^{p} x_{\cdot j} \log x_{\cdot j}}{x_{\cdot \cdot}} - \frac{\sum_{i = 1}^{m} x_{i \cdot} \log x_{i \cdot}}{x_{\cdot \cdot}} $$ | @routledge1977 * | | $$ \beta_E = \exp(\beta_I) - 1 $$ | @routledge1977 * | | $$ \beta_T = \frac{g(H) + l(H)}{2\alpha} $$ | @wilson1984 * | Table: Turnover measures. *: implemented.

Where:

Usage

Similarity

Definitions

Similarity between two samples $a$ and $b$ or between two types $x$ and $y$ can be measured as follow.

These indices provide a scale of similarity from $0$-$1$ where $1$ is perfect similarity and $0$ is no similarity, with the exception of the Brainerd-Robinson index which is scaled between $0$ and $200$.

| Measure | Reference | |:----------------------------------|:----------------| | $$ C_J = \frac{o_j}{S_a + S_b - o_j} $$ | Jaccard * | | $$ C_S = \frac{2 \times o_j}{S_a + S_b} $$ | Sorenson * | Table: Qualitative similarity measures (between samples). *: implemented.

| Measure | Reference | |:----------------------------------|:---------------| | $$ C_{BR} = 200 - \sum_{j = 1}^{S} \left| \frac{a_j \times 100}{\sum_{j = 1}^{S} a_j} - \frac{b_j \times 100}{\sum_{j = 1}^{S} b_j} \right|$$ | @brainerd1951, @robinson1951 * | | $$ C_N = \frac{2 \sum_{j = 1}^{S} \min(a_j, b_j)}{N_a + N_b} $$ | @bray1957, Sorenson * | | $$ C_{MH} = \frac{2 \sum_{j = 1}^{S} a_j \times b_j}{(\frac{\sum_{j = 1}^{S} a_j^2}{N_a^2} + \frac{\sum_{j = 1}^{S} b_j^2}{N_b^2}) \times N_a \times N_b} $$ | Morisita-Horn * | Table: Quantitative similarity measures (between samples). *: implemented.

| Measure | Reference | |:----------------------------------|:---------------| | $$ C_{Bi} = \frac{o_i - N \times p}{\sqrt{N \times p \times (1 - p)}} $$ | @kintigh2006 * | Table: Quantitative similarity measures (between types). *: implemented.

Where:

Usage

# Brainerd-Robinson (similarity between assemblages)
mississippi %>%
  as_count() %>%
  similarity(method = "brainerd") %>%
  plot_spot() +
  khroma::scale_colour_YlOrBr()

# Binomial co-occurrence (similarity between types)
mississippi %>%
  as_count() %>%
  similarity(method = "binomial") %>%
  plot_spot() +
  khroma::scale_colour_PRGn()

Sample Size and Significance

@kintigh1989

## Data from Conkey 1980, Kintigh 1989, p. 28
chevelon <- as_count(chevelon)

sim_evenness <- simulate_heterogeneity(chevelon, method = "shannon")
plot(sim_evenness) +
  ggplot2::theme_bw() +
  ggplot2::theme(legend.position = "bottom")

sim_richness <- simulate_richness(chevelon, method = "none")
plot(sim_richness) +
  ggplot2::theme_bw() +
  ggplot2::theme(legend.position = "bottom")

Abundance models

Ranks vs abundance plot can be used for abundance models (model fitting will be implemented in a future release):

mississippi %>%
  as_count() %>%
  plot_rank(log = "xy") +
  ggplot2::theme_bw() +
  khroma::scale_color_discreterainbow()

References



Try the tabula package in your browser

Any scripts or data that you put into this service are public.

tabula documentation built on May 25, 2021, 5:11 p.m.