library(learnr) tutorial_options(exercise.reveal_solution = FALSE) gradethis::gradethis_setup() knitr::opts_chunk$set(error = TRUE) set.seed(123)
In this series of practicals you will develop the code to calculate a few diversity measures. In this first exercise we will calculate the species richness and Simpson index of a population.
R provides us with many built-in datasets, but sadly none of them are counts of species abundances, so you will use the famous Barro Colorado Island dataset, a 30+ year study of a 50-hectare forest plot in Panama. We will use the number of trees in the study site in 2010, which you have access to from the package you developed in Practical Series 4.
You will also construct a couple of fake datasets that you will use for comparison. Calculating the diversity of these populations (remembering that there are many, in fact a continuum of, diversity measures) will tell us how many observed species there are -- the species richness -- and give us measures of how evenly distributed the trees are amongst the tree species (the evenness of the species distribution) by various measures. Species richness is a very commonly used measure of diversity, which simply counts the number of non-zero species abundances; Simpson index is also commonly used: it simply calculates the species proportions from the counts (so the total sums to 1), squares them all individually, and add them all together -- it tells us (amongst other things) how likely two randomly chosen individuals are to be from the same species.
This project will be written in the githubusernameDiversity package that
you set up in the last exercise, with functions being saved in the
R folder and documented so that they work
with the standard R help system, and scripts being saved as demos in the
demo folder so that they can be run using
the demo()
function or compiled into reports to read the results with suitable
reporting explaining what is happening. In each project exercise you will have
to write some functions and (most likely) one script. So now create a demo
script for this project and add an entry for it in the
00Index file in the same folder.
We are taking the data from the 2010 census -- the true abundances for the whole site can be calculated as follows:
library(githubusernameBCI) tree.pop <- rowSums(bci_2010)
Note that if you don't want to use your own BCI package, you can use ours instead. It's available as
soniamitchellBCI
in the SBOHVM organisation on GitHub just like yours. Either way you should install the package you are using by runningdevtools::install_github()
, and then add it to yourgithubusernameDiversity
package dependencies by runningusethis::use_dev_package()
appropriately.
The total counts of trees of different species in this forest plot are now
available in the variable tree.pop
. We can then do some simple extraction of
information from the full species counts:
species.names <- names(tree.pop) num.sp <- length(tree.pop) num.trees <- sum(tree.pop) mean.trees <- mean(tree.pop)
The next two lines sample a single random quadrat and sum up 10 random
quadrats, and store the results in quadrat.pop
and quadrat10.pop
,
respectively:
quadrat.pop <- rowSums(sample_by_subcommunities(bci_2010, 1)) quadrat10.pop <- rowSums(sample_by_subcommunities(bci_2010, 10))
You will also use a handful of fake datasets for comparison:
one.pop <- c(num.trees, rep(0, num.sp - 1)) uneven.pop <- c(num.trees / 2, rep(0, num.sp - 1)) + rep(mean.trees / 2, num.sp) mixed.pop <- (1:num.sp) / sum(1:num.sp) * num.trees even.pop <- rep(mean.trees, num.sp) rand.pop <- as.vector(rmultinom(1, num.trees, rep(1 / num.sp, num.sp))) rand50.pop <- as.vector(rmultinom(1, round(num.trees / 50), rep(1 / num.sp, num.sp)))
Remember that rep(x, size)
repeats x
, size
times to make a vector of
length size
, and rmultinom()
draws from the multinomial distribution,
in this case, rand.pop
is a population of 222,715 individuals
drawn from 299 species, each equally abundant ($\frac{1}{299}$). Note that
rmultinom()
is is in the stats
package, so you need to add this to the
dependencies of your package now. We won't tell you about any future
dependencies you need to add, but note that you can check what package a
function is in by looking in the help file for the function:
```{R, eval = FALSE} ?rmultinom
If you run this in the console, the **Multinom {stats}** in the top left of the help page tells you that this function is in the `stats` package. ### A species richness function Then write and document a function that takes a vector of abundances like the data above as an argument and calculates the species richness from it. Remember that the species richness is the number of species whose abundance is not zero. The name of this function should start with the letters `p1_`, so if you decide to call it `species_richness()`, you should actually call it `p1_species_richness()`. This is because you will be making a series of similar functions throughout the project, so they will need to be called `p1_species_richness()`, `p2_species_richness()`, `p3_species_richness()`, *etc.* to distinguish them. *Hint:* Applying a test to a vector of numbers, will apply the test to all of the numbers in the vector and produce a vector of booleans (`TRUE` or `FALSE`) values, so: ```{R, comment = ""} c(1, 3, 5, 7) > 4
1 and 3 are not > 4
and 5 and 7 are. And then sum()
will count the number of
TRUE
s in such a vector of booleans. So:
```{R, comment = ""} over4s <- (c(1, 3, 5, 7) > 4) sum(over4s)
If you think about it, you should be able to see how to use these two ideas to count the number of non-zero abundances in a population. You can see more about this in Practical A-1. ### A Simpson index function Then write and document a second function that will calculate the Simpson index of the population, remembering, first, that this measure works with the relative proportions of the species not the total counts and, second, that we can do calculations with the whole vector at once. ### Running the code Look at the values in the nine population vectors -- you could try sorting them using `sort(x.pop)` and plotting them using `plot(sort(x.pop))`, `barplot(sort(x.pop))` (or something more attractive?) to get a clearer idea of what they look like -- and decide which you think have the richest species diversity. Run the two functions on the populations provided as see whether the results of the two diversity measures agree with your assessments. Note that a lower value means a higher diversity for the Simpson index measure. **NB** Remember that to make the functions you have written accessible to your demo, you need to document and install them: ```{R, eval = FALSE} devtools::document() devtools::install() library(githubusernameDiversity) ?p1_species_richness
You also need to install the package to use the demos:
{R, eval = FALSE}
devtools::install()
demo(package = "githubusernameDiversity")
Edit the demo that you have started above so that it loads in the functions (by
calling library(githubusernameDiversity)
), reads in the BCI data, and
generates a report that shows some kind of plot of each of the series of counts
and then calculates their species richness and Simpson index Check that
your functions also generate appropriate help files
(e.g. ?p1_species_richness
).
Remember to commit the code once you are happy before moving on to the next exercise -- you can always come back later and make further commits if you realise you need to change something. I won't remind you about this in subsequent exercises.
As we said above, note that you will often need to refer back to functions in
later exercises (as you did in the practicals) and you will be writing several
very similar functions in successive exercises. You will need to make sure they
do not have identical names or you won't be able to call them all.
The simplest way of doing this might be to label them by the project exercise
number and call them p1_species_richness
, p2_species_richness
, etc.
depending on what exercise you are in, but you are welcome to find a more
elegant solution -- anything that distinguishes them is fine.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.