In IBAHCM/RPiR: Reproducible Programming in R

library(learnr)
tutorial_options(exercise.reveal_solution = FALSE)
gradethis::gradethis_setup()

knitr::opts_chunk$set(error = TRUE)
set.seed(123)

Overview

In this series of practicals you will develop the code to calculate a few diversity measures. In this first exercise we will calculate the species richness and Simpson index of a population.

Background

R provides us with many built-in datasets, but sadly none of them are counts of species abundances, so you will use the famous Barro Colorado Island dataset, a 30+ year study of a 50-hectare forest plot in Panama. We will use the number of trees in the study site in 2010, which you have access to from the package you developed in Practical Series 4.

You will also construct a couple of fake datasets that you will use for comparison. Calculating the diversity of these populations (remembering that there are many, in fact a continuum of, diversity measures) will tell us how many observed species there are -- the species richness -- and give us measures of how evenly distributed the trees are amongst the tree species (the evenness of the species distribution) by various measures. Species richness is a very commonly used measure of diversity, which simply counts the number of non-zero species abundances; Simpson index is also commonly used: it simply calculates the species proportions from the counts (so the total sums to 1), squares them all individually, and add them all together -- it tells us (amongst other things) how likely two randomly chosen individuals are to be from the same species.

Tasks

This project will be written in the githubusernameDiversity package that you set up in the last exercise, with functions being saved in the R folder and documented so that they work with the standard R help system, and scripts being saved as demos in the demo folder so that they can be run using the demo() function or compiled into reports to read the results with suitable reporting explaining what is happening. In each project exercise you will have to write some functions and (most likely) one script. So now create a demo script for this project and add an entry for it in the 00Index file in the same folder.

The study populations

We are taking the data from the 2010 census -- the true abundances for the whole site can be calculated as follows:

library(githubusernameBCI) 
tree.pop <- rowSums(bci_2010)

Note that if you don't want to use your own BCI package, you can use ours instead. It's available as soniamitchellBCI in the SBOHVM organisation on GitHub just like yours. Either way you should install the package you are using by running devtools::install_github(), and then add it to your githubusernameDiversity package dependencies by running usethis::use_dev_package() appropriately.

The total counts of trees of different species in this forest plot are now available in the variable tree.pop. We can then do some simple extraction of information from the full species counts:

species.names <- names(tree.pop) 
num.sp <- length(tree.pop) 
num.trees <- sum(tree.pop) 
mean.trees <- mean(tree.pop)

The next two lines sample a single random quadrat and sum up 10 random quadrats, and store the results in quadrat.pop and quadrat10.pop, respectively:

quadrat.pop <- rowSums(sample_by_subcommunities(bci_2010, 1)) 
quadrat10.pop <- rowSums(sample_by_subcommunities(bci_2010, 10))

You will also use a handful of fake datasets for comparison:

one.pop <- c(num.trees, rep(0, num.sp - 1)) 
uneven.pop <- c(num.trees / 2, rep(0, num.sp - 1)) + rep(mean.trees / 2, num.sp) 
mixed.pop <- (1:num.sp) / sum(1:num.sp) * num.trees
even.pop <- rep(mean.trees, num.sp)
rand.pop <- as.vector(rmultinom(1, num.trees, rep(1 / num.sp, num.sp)))
rand50.pop <- as.vector(rmultinom(1, round(num.trees / 50), rep(1 / num.sp, num.sp)))

Remember that rep(x, size) repeats x, size times to make a vector of length size, and rmultinom() draws from the multinomial distribution, in this case, rand.pop is a population of 222,715 individuals drawn from 299 species, each equally abundant ($\frac{1}{299}$). Note that rmultinom() is is in the stats package, so you need to add this to the dependencies of your package now. We won't tell you about any future dependencies you need to add, but note that you can check what package a function is in by looking in the help file for the function:

```{R, eval = FALSE} ?rmultinom

If you run this in the console, the **Multinom {stats}** in the top left of the
help page tells you that this function is in the `stats` package.

### A species richness function

Then write and document a function that takes a vector of abundances like the
data above as an argument and calculates the species richness from it. Remember
that the species richness is the number of species whose abundance is not zero.
The name of this function should start with the letters `p1_`, so if you decide
to call it `species_richness()`, you should actually call it
`p1_species_richness()`. This is because you will be making a series of similar
functions throughout the project, so they will need to be called
`p1_species_richness()`, `p2_species_richness()`, `p3_species_richness()`,
*etc.* to distinguish them.

*Hint:* Applying a test to a vector 
of numbers, will apply the test to all of the numbers in the vector and produce 
a vector of booleans (`TRUE` or `FALSE`) values, so:

```{R, comment = ""}
c(1, 3, 5, 7) > 4

1 and 3 are not > 4 and 5 and 7 are. And then sum() will count the number of TRUEs in such a vector of booleans. So:

```{R, comment = ""} over4s <- (c(1, 3, 5, 7) > 4) sum(over4s)

If you think about it, you should be able to see how to use these two ideas to
count the number of non-zero abundances in a population. You can see more about
this in Practical A-1.

### A Simpson index function

Then write and document a second function that will calculate the Simpson 
index of the population, remembering, first, that this measure works with 
the relative proportions of the species not the total counts and, second, that 
we can do calculations with the whole vector at once.

### Running the code

Look at the values in the nine population vectors -- you could try sorting them 
using `sort(x.pop)` and plotting them using `plot(sort(x.pop))`, 
`barplot(sort(x.pop))` (or something more attractive?) to get a clearer idea of 
what they look like -- and decide which you think have the richest species 
diversity.

Run the two functions on the populations provided as see whether the results of 
the two diversity measures agree with your assessments. Note that a lower value 
means a higher diversity for the Simpson index measure.

**NB** Remember that to make the functions you have written accessible to
your demo, you need to document and install them:

```{R, eval = FALSE}
devtools::document()
devtools::install()
library(githubusernameDiversity)
?p1_species_richness

You also need to install the package to use the demos:

{R, eval = FALSE} devtools::install() demo(package = "githubusernameDiversity")

Demo

Edit the demo that you have started above so that it loads in the functions (by calling library(githubusernameDiversity)), reads in the BCI data, and generates a report that shows some kind of plot of each of the series of counts and then calculates their species richness and Simpson index Check that your functions also generate appropriate help files (e.g. ?p1_species_richness).

Remember to commit the code once you are happy before moving on to the next exercise -- you can always come back later and make further commits if you realise you need to change something. I won't remind you about this in subsequent exercises.

Note on function names

As we said above, note that you will often need to refer back to functions in later exercises (as you did in the practicals) and you will be writing several very similar functions in successive exercises. You will need to make sure they do not have identical names or you won't be able to call them all. The simplest way of doing this might be to label them by the project exercise number and call them p1_species_richness, p2_species_richness, etc. depending on what exercise you are in, but you are welcome to find a more elegant solution -- anything that distinguishes them is fine.

IBAHCM/RPiR documentation built on Jan. 12, 2023, 7:41 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com