library(learnr) knitr::opts_chunk$set(error = TRUE) set.seed(123)
In this session you will take the biodiversity data package you have created and write a demo that calculates a variety of diversity measures on the data.
Now you have a package with your data in it, we can easily run analyses, simply
starting each analysis script with the R command library(githubusernameBCI)
.
There are an enormous number of different ways of measuring diversity, to the extent that diversity itself has been declared a "non-concept" in the past (by Hurlbert, 1971), but nowadays they fit broadly within Whittaker's framework of three basic types of diversity: alpha, beta and gamma diversity.
All of these types of diversity can be measured in several different ways. The standard approach is often called "species diversity", where we treat all species as distinct and all of our measures of alpha and gamma diversity are counting in terms of effective numbers of species. However, we often need to consider similarity between species, and this results in a wide variety of measures, from closely related ones such as counts of "distinct species" (species that are taxonomically unrelated) or "distinct evolutionary histories" (species that have no common ancestor in a phylogeny), to ones that are more peripherally connected such as "distinct functional groups" (species that are functionally distinct). Leinster and Cobbold (2012) have developed a framework for working with all of these types of similarity consistently. Because beta diversity measures are comparing subcommunities, they are generally unitless (measuring fractions or percentages) or are measured in terms of numbers of subcommunities; how similar two subcommunities are seen to be still entirely depends on the type of similarity though, resulting in, for instance a count of "the effective number of phylogenetically / functionally / taxonomically distinct subcommunities". The relationship between alpha, beta and gamma has been the source of many arguments -- see, for example, Whittaker (1972), Jost (2007) and Reeve et al. (2016).
Finally, I used the phrase "effective number" several times above. At one extreme, when we disregard abundance and just look at presence-absence data, we measure "species richness" by simply counting the number of species present; at the opposite extreme, we have measures such as Berger-Parker which only look at dominance of the most abundant species. In between, we find measures such as Simpson and Shannon, where we weight our counting of species by their abundance. For instance, Shannon entropy is a measure of the "surprise" or uncertainty expected from the next observation, which is low if there are few species or one is dominant, but high when many are equally abundant. In all cases, the highest diversity is achieved when all species are equally abundant (and completely distinct, when we are considering similarity). We therefore talk about the "effective number of species" as:
the number of completely distinct, equally abundant species that would give us the same value of diversity as the observed community.
This acts as a reference to allow fair comparisons to be made. In the case of species richness, this is just the richness itself, but for instance in the case of Shannon entropy, the effective numbers equivalent is in fact the exponential of the entropy. For diversity measures that work in terms of these effective numbers, the extent to which we weight by abundance is determined by a parameter $q$, which varies from 0, where relative abundance is ignored and only presence-absence is considered, to $\infty$, where we only consider the most abundant species. This standard for dealing with diversities as effective numbers, while far from universal, has now been pursued for over 40 years and is broadly accepted -- originally by Hill (1973), and subsequently, for instance, by Jost (2007), Leinster and Cobbold (2012), and Reeve et al. (2016).
First, remember as you add in library(xxx)
calls into your demo, you'll need to
add them in as dependencies to the package using usethis::use_package("xxx")
,
otherwise other people may not be able to run the demo. To test the demo as you
go along you may also need to install the packages in advance using
install.packages("xxx")
.
As in the last practical series, we're going to create a demo inside the package. In the Files tab, you may need to create a new demo folder. Then create an R script in that folder to contain your demo, call it diversity-of-BCI.R. Note in general you just have to make sure the name of the script has no spaces in it. Then (again, as before) create a new text file also in that folder called 00Index, with no file extension to contain the list of the demos in your package. As you should know by now, each demo you create is entered into one line of this file with the demo name followed by a one-line description. For example, here, the demo file diversity-of-BCI.R would have a line like:
diversity-of-BCI Showing the diversity of the BCI forest plot.
For more details on how to create a demo go here.
Now you're going to write the R script that will be the demo. The script will be
measuring several kinds of alpha, beta and gamma diversity, as detailed below.
But in order to do some work with measures of taxonomic diversity, the taxonomy
will first need to match the species abundance data from the BCI dataset,
exactly. Here I'm assuming that you created the package githubusernameBCI
, and
in it bci_taxa
contains the taxonomy of all BCI species and bci_2010
contains the 2010 BCI census.
library(githubusernameBCI) # Make a copy of the taxonomy as a data frame taxonomy <- as.data.frame(bci_taxa) # Make sure the species are ordered correctly rownames(taxonomy) <- taxonomy$GenusSpecies taxonomy <- taxonomy[rownames(bci_2010),]
To create a metacommunity to analyse in the rdiversity
package, you will need
to create an object of that type using metacommunity()
, and then you can
calculate, for example, the gamma diversity of the whole metacommunity with the
function meta_gamma()
and plot the output with plot()
:
library(rdiversity) meta <- metacommunity(bci_2010) results <- meta_gamma(meta, qs = seq(from = 0, to = 5)) plot(diversity ~ q, type = "l", data = results)
To create a metacommunity that takes account of the similarity between species,
you need to input the similarity as a second argument to metacommunity()
.
First use tax2dist()
to create a distance matrix from a taxonomy, then use
dist2sim()
to create a similarity matrix from the distance matrix:
library(magrittr) taxSim <- taxonomy %>% tax2dist(c(GenusSpecies=0, Genus=1, Family=2, Other=3)) %>% dist2sim("linear") metatax <- metacommunity(bci_2010, taxSim) results4tax <- meta_gamma(metatax, qs = seq(from = 0, to = 5)) lines(diversity ~ q, col=2, data = results4tax)
Finally, to create a metacommunity to analyse in vegan
, you need to transpose
the axes of the abundance data matrix:
library(vegan) # Swap rows and columns abundances <- t(bci_2010) # Calculate species richness (gamma diversity) specnumber(abundances, groups = 1)
This loads the dataset and then calculates the species richness of the whole
metacommunity. It's important to always use the abundances
object (created
above) when using the vegan
package, and always use the bci_2010
object when
using rdiversity
-- they need the data in opposite formats!
Note the documentation and reference manual for the vegan
package is available
at https://cran.r-project.org/web/packages/vegan/index.html and the package
itself is developed on GitHub at https://github.com/vegandevs/vegan.
Because we lost two days from the course, the material here wasn't fully
covered, so is optional if you have enough time to look further at the
documentation for the rdiversity
and vegan
packages.
First, read the rdiversity
package documentation and examples. You can find
this by going to the GitHub repository (https://github.com/boydorr/rdiversity),
and clicking on the docs: rdiversity badge,
or by clicking the link to https://boydorr.github.io/rdiversity.
Then add to the script you have started above to calculate and plot the gamma
diversity and the normalised alpha diversity for the 2010 BCI census, for a
range of values of $q$, say qs = 0:5
; individuals of the same species have a
taxonomic distance of zero and each species is taxonomically distinct. Then
repeat these calculations using taxonomic similarity, showing that there are
fewer taxonomically distinct species than there are species. Then do this again
counting the effective number of genera and families. Note that this can be
achieved by using tax2dist()
with
c(GenusSpecies=0, Genus=0, Family=1, Other=1)
and
c(GenusSpecies=0, Genus=0, Family=0, Other=1)
, respectively -- these distance
measurements say that two species in the same genus (or family) are zero
distance apart, just like two individuals in the same species. As a result, the
diversity measures following these instructions will simply count in effective
numbers of genera or families, respectively, rather than species.
Moving on to beta diversity, we now want to find out which quadrats in the BCI
study site are the most distinct. We can measure the average distinctiveness of
all of the quadrats in the plot using raw_meta_beta()
, but you can also ask
for the distinctiveness of every quadrat individually using raw_sub_beta()
.
Calculate that for a single value of $q$, say qs=1
, and show a histogram of
the distinctivenesses (the diversity column) of the subcommunities. Sort the
data frame by distinctiveness and output just the highest values. Bearing in
mind that the first two digits of the quadrat name are an x-coordinate, and the
second two are a y-coordinate, is there any pattern about these distinct areas?
If so write about it in the report.
A similar option in vegan
is the function betadiver()
. Look at the
documentation in R for this function -- you will see that it can calculate a very
large range of different beta diversity measures -- and calculate the beta
diversities for the subcommunities using the "j"
Jaccard and "sor"
Sorensen
similarity measures. This function calculates pairwise similarities between the
quadrats as dist
objects. You can convert these to a matrix and then calculate
the average similarities to all of the quadrats by using as.matrix()
and then
rowMeans()
. You should be able to show that some of the identified quadrats
(with the lowest average similarity to the other quadrats) are the same, even
though the exact beta diversity measure, and indeed even its direction, is
different.
Remember to add some text chunks into the report to describe the results for when you compile a report rather than run the demo. Now reinstall the package and make sure the demo runs successfully too. Don't forget to delete any html files that have been generated and don't commit them to the repo.
Now push all of your changes (your commits) to GitHub.
If you've installed a package directly using devtools::install_github()
you can still generate a report from the demo as follows without cloning the
repo:
rmarkdown::render(system.file("demo", "diversity-of-BCI.R", package = "mypartnersBCIpackage"), output_dir = ".")
The report will be saved to the base folder of your current project (or whatever your current working directory is). Check that it looks okay by opening it in a web browser (and delete the html file afterwards).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.