knitr::opts_chunk$set(echo = TRUE)

Detour 1: Why to standardize alpha and gamma scale to the same sample size?

Example: Beta-diversity of the BCI data set from vegan

Let's look at the beta_diversity of the BCI data set that comes with the package vegan. It's a site by species abundance matrix with 50 plots and 225 species from the Barro Colorado Island in Panama. If you want to know more about the data set type ?BCI after loading the the vegan package. We are interested in the beta-diversity of the island because it tells us something about the spatial structure of species diversity. First, let's calculate Whittaker's multiplicative beta-diversity as

formula

where \gamma is the gamma species richness (i.e. all plots combined) and \overline{\alpha} is the alpha species richness (i.e. the average plot richness).

library(betaC)
library(vegan)
data(BCI)

# Whittakers multiplicative beta-diversity 
gamma=specnumber(colSums(BCI))
alpha= mean(specnumber(BCI))
beta_BCI=gamma/alpha
beta_BCI

The samples have a beta-diversity of r round(beta_BCI,2) The unit of this value is "effective number of distinct sampling units". This means that it takes r round(beta_BCI,2) hypothetical plots with complete species turnover to produce the same beta-diversity that we observe in this data-set. Note that the maximum possible value of beta is 50 here. In that case all 50 plots would be completely unique in their species identities. The minimum value of beta is 1 in which case all the plots share the same species. However, this assumes that the sample size is big enough to randomly sample all the species in the area - in other words that the samples are complete. While this may be true on the gamma scale, where we have a really large sample size, the alpha scale has a way smaller number of individuals. Remember that the gamma scale has all the individuals of the 50 subplots combined, while the average plot has only a 50th of this sample size. Let's have a look at this difference:

N_alpha= mean(rowSums(BCI))
N_gamma= sum(BCI)
barplot(c("gamma"=N_gamma,"alpha"= N_alpha), ylab = "Number of individuals", xlab = "Scale")

To control for this vast difference in sample size between the scales we use a method called individual-based rarefaction (IBR). IBR lets you ask the question "How many species can I expect to find if I sampled n Individuals rather than the actual sample size of N?" This enables us to rescale our gamma diversity estimate to the sample size observed at the alpha scale. The relationship between this "rarefied richness" and the sample size n is called a individual based rarefaction curve. The r package vegan provides functions to carry out IBR. Let's plot the IBR curve for the gamma scale and see how the species richness changes when we rarefy it down to the sample size of the alpha scale.

#gamma
gamma_curve= as.numeric(rarefy(colSums(BCI),sample = 1:N_gamma))

# plot
plot(gamma_curve, xlab = "Number of individuals", ylab= "Rarefied richness", type = "l", lwd=2)
abline(v = N_alpha, col="grey", lty=3, lwd=2)
abline(h = gamma_curve[N_alpha], col="grey", lty=3, lwd=2)
legend("bottomright", legend = "gamma", col = 1, lty = 1,lwd = 2,bty = "n")

This black curve is the gamma scale IBR curve. Typically, accumulation or rarefaction curves have this non-linear shape of a saturation curve. The grey vertical line marks the alpha scale sample size and the horizontal grey line is the rarefied richness of the gamma scale that corresponds to this sample size. It looks like about half of the species on the gamma scale have only been found because the gamma scale as way more individuals than the alpha scale. Let's look at the alpha scale to see how it compares.

# alpha 
alphas_curves<-rarecurve(BCI)

These are all the alpha scale rarefaction curves. We are usually interested in the average sample Therefore let's take the mean of these curves. Also, let's put them in the in the same graph as the gamma scale to see how they compare.

N_min= min(rowSums(BCI))
mean_alpha_curve= sapply(alphas_curves,function(x) return(as.numeric(x[1:N_min])) ) %>% rowMeans()


plot(gamma_curve, xlab = "Number of individuals", ylab= "Rarefied richness", type = "l", lwd=2)

lines(mean_alpha_curve, col =3, lwd=2)

Now, the green curve that we added is the mean alpha scale IBR. It sits slightly below the gamma scale but most of the difference in species richness between the to scales is really due to the more-individuals effect. If the curves fall right on top of each other, species are randomly distributed and if the alpha scale is below the gamma scale this indicates intraspecific spatial aggregation. We quantify this aggregation as the deviation between the two curves at a common number of individuals (usually that is the smallest number of individuals found at the alpha scale)

Like Whittaker's beta we calculate an index of gamma over alpha. However, here we use the rarefied richness estimates rather than the observed species richness. The resulting index is called beta_Sn. Let's calculate it with the function beta_SN provided by our package.

beta_Sn_BCI<-beta_SN(BCI, N_min)
beta_Sn_BCI

The value is larger than 1, which indicates the spatial aggregation that we saw from looking at the IBR curves. However, it is much smaller than the traditional beta-diversity that was inflated due to the more-individuals effect. Instead of r round(beta_BCI,2) times as many species, now the gamma scale only has r round(beta_Sn_BCI,2) times as many species as the alpha scale when controlled for sample size. This is the true effect of spatial aggregation, conditional on the sample size ofr N_min.

So, we saw that the deviation between alpha and gamma scale is due to non-random spatial structure in diversity. However, the magnitude of this gap depends on how far out along the IBR curve it is calculated. Let's use the same example from above but now, instead of calculating beta_Sn at for a sample size of r N_min, let's use a much smaller value of n=10.

beta_SN(BCI, 10)

This estimate corresponds to a point of the IBR that is much further away from the asymptote than the previous one. It turns out that the closer we get to the asymptote of the gamma scale IBR, the bigger this gap can be. This becomes important if we want to compare beta_SN values from one community to the next. Because "farawayness" from the asymptote cannot be described by sample size alone.

Detour 2: Why to standardize beta-diversity by coverage?

Sample size is always relative

While sample size is surely related to the distance to the asymptote of the IBR, it is only part of the story. The asymptote or more generally the slope at any point along the IBR curve relates to something called sample completeness. This differs from sample size in that it depends on the species pool of the community that you are sampling.

For example consider a forest that has a total of 500 species and for the sake of this exercise let's assume that all of them are equally common. Now, we take a random sample of 500 trees. It is highly unlikely that all 500 species will be captured by this sample. A very large fraction of the species pool will be missing, while others will pop up multiple times. In other words: In this first forest with a large species pool, the sample size of 500 corresponds to a low sample completeness and is relatively far away from the asymtote of the IBR curve.

Now, imagine a second forest that has way fewer species, let's say 100. Again, we take a sample of 500 trees. This time we can probably expect that a good proportion of the species in this forest will be sampled, probably not all of them, but surely a much bigger fraction than in the fist forest. In other words: In this second forest with a smaller species pool, the sample size of 500 corresponds to a high sample completeness and is relatively close to the asyptote .

The figure below shows the IBR curves corresponding the large (A) and small (B) species pool of this example.

include_graphics("man/figures/Figure2.jpg") 

Key references



T-Engel/betaC documentation built on May 2, 2021, 3:19 a.m.