Alpha diversity and geographic range size of Carboniferous and Permian tetrapods
In palaeoverse: Prepare and Explore Data for Palaeobiological Analyses

Authors: The Palaeoverse Development Team

Last updated: 2024-10-14

# Introduction `palaeoverse` is an R package developed by palaeobiologists, for palaeobiologists. The aim of the package is to generate a community-driven toolkit of generic functions for the palaeobiological community. The package does not provide implementations of statistical analyses, rather it provides auxiliary functions to help streamline analyses and improve code readability and reproducibility. This vignette (or tutorial if you prefer) is provided to guide you through the installation process and some of the functionality available within `palaeoverse`. To do so, we will focus on a usage example looking at various trends in tetrapods across the Carboniferous/Permian boundary. ## Installation The `palaeoverse` package can be installed via CRAN or its dedicated [GitHub repository](https://github.com/palaeoverse/palaeoverse) if the development version is preferred. To install via the CRAN, simply use: wzxhzdk:0 To install the development version, first install the `devtools` package, and then use `install_github` to install `palaeoverse` directly from GitHub. wzxhzdk:1 You can now load `palaeoverse` using the default `library` function: wzxhzdk:2 **Before we get onto the good stuff, the development team has a small request**. If you use `palaeoverse` in your research, please cite the associated publication. This will help us to continue our work in supporting you to do yours. You can access the appropriate citation via: wzxhzdk:3 ## Now that that's out of the way, let's begin! The Carboniferous and early Permian have been regarded as critical intervals for early four-limbed vertebrates, in terms of both diversification and biogeography. During this time several major groups of tetrapods emerged (including crown amniotes), and the extensive tropical rainforests of the Carboniferous gave way to dryland vegetation during the Permian, in what's known as the 'Carboniferous Rainforest Collapse' (CRC) (for more information, see Dunne et al., 2018). In this vignette we’ll use the `tetrapods` dataset to demonstrate how to use `palaeoverse` functions to examine how alpha diversity and range size vary across this interval. # The Tetrapods Dataset Functions in `palaeoverse` are designed around the assumption that most users' data are stored in an occurrence `data.frame`. As such, this is the expected input data format for most functions. We provide two example occurrence datasets (`tetrapods` and `reefs`) within the package from different data sources. While these two datasets have similar structures, the data (variables) available in each differ. The `tetrapods` dataset is a compilation of Carboniferous--Early Triassic tetrapod occurrences (*n* = 5,270) from the [Paleobiology Database](https://paleobiodb.org/#). We'll be using this dataset to demonstrate some of the available functionality in `palaeoverse`. Let's get started by exploring the dataset. wzxhzdk:4 wzxhzdk:5 wzxhzdk:6 wzxhzdk:7 As you can see, we have a variety of variables to play around with! For this vignette we'll mostly be focusing on some of the spatial information, such as latitude and longitude. But first, we need to do some data prep... # Exploring the functions If we're looking at patterns of tetrapod diversity and distribution over the Carboniferous and Permian, we'll first have to bin our occurrences by time and by space. We can do this easily with a few functions in the `palaeoverse` package. Let's first look at binning by time. There are two `palaeoverse` functions that help us to assign occurrences to different intervals, or 'bins', of time. First, we'll use `time_bins` to make some suitable time bins for our study. The function requires a maximum and a minimum time interval to run. As we know we're interested in looking at the Carboniferous and the Permian, we can use these as our intervals to get the suitable stage-level bins that we're after: wzxhzdk:8 wzxhzdk:9 Before we can bin our occurrences into these time bins, we will need the ages of the occurrences in millions of years. Since the numeric ages may have changed since we downloaded the data from the PBDB, we'll use `look_up` to get the numeric ages for the occurrences within `tetrapods` dataset based on their assigned interval names: wzxhzdk:10 wzxhzdk:11 As you can see we've got some new columns with updated interval ages. Handy! Now we're ready to bin the occurrences into these time bins using `bin_time`. For now, we'll go with the 'mid' method, but we'll talk about this more in a minute. wzxhzdk:12 wzxhzdk:13 Oh no! We've hit an error. This is because there's some occurrences that sit outside of the time intervals we're interested in. `bin_time` can't bin occurrences which sit outside of the time intervals you've chosen, so for it to work, we'll first have to remove these occurrences. wzxhzdk:14 wzxhzdk:15 That's better! Let's see how our dataset has changed. wzxhzdk:16 wzxhzdk:17 As you can see, four new columns (variables) have been added to our dataset. Two of these are especially important. First is the `bin_assignment`, which shows which number bin the occurrence has been added to. Second is `n_bins` - this shows how many bins the occurrence could potentially have been placed within. Let's examine this further. wzxhzdk:18 wzxhzdk:19 wzxhzdk:20 wzxhzdk:21 As you can see, only ~25% of the dataset can be assigned to just one bin - the other occurrences can't be confidently assigned to an individual time interval. This is where the other methods for the `bin_time` function come in handy. They provide the user with different options when assigning occurrences to bins. Instead of the 'mid' method, let's try the 'majority' method instead, which places the occurrences in the bin which covers the majority of their potential time range. wzxhzdk:22 We'd recommend playing around with these different options and seeing how they impact your results! Next, we'll also bin our occurrences spatially. We need a couple of functions to do this. Firstly, we need to add palaeocoordinates to our dataset. We can use the `palaeorotate` function to do this. wzxhzdk:23 Now let's check that it's worked... wzxhzdk:24 wzxhzdk:25 Looks good to me! Now that we have palaeocoordinates for our occurrences, we can put them into spatial bins. We can do this using the `bin_space` function, which assigns each occurrence to one of many discrete equal-area hexagonal grid cells (based on [Uber's H3 system](https://h3geo.org/)). wzxhzdk:26 wzxhzdk:27 wzxhzdk:28 wzxhzdk:29 `bin_space` adds some new columns, including one with the unique grid cell ID, as well as latitudinal and longitudinal coordinates for the centroid of that cell. Now that we have both temporal and spatial bins, we can use these data to see how the spatial coverage of tetrapod fossils in our dataset varies through time. wzxhzdk:30 Let's save that for now and move on to some of our analyses! # Unique taxa and alpha diversity In many cases palaeobiologists count unique taxa by retaining only unique occurrences identified to a given taxonomic resolution (e.g. genera or species); however, this can result in some useful information being discarded. The `tax_unique` function retains occurrences identified to a coarser taxonomic resolution which are not already represented within the dataset. Take, for example, the following occurrences: - *Albertosaurus sarcophagus* - *Ankylosaurus* sp. - *Ceratopsidae* indet. - *Ornithominus* sp. - *Tyrannosaurus rex* A filter for species-level identifications would reduce the species richness to just two (*A. sarcophagus* and *T. rex*). However, none of these clades are nested within one another, so each of the indeterminately identified occurrences represents at least one species not already represented in the dataset. `tax_unique` is designed to deal with such taxonomic data and would retain all seven 'species' in this example. For further details see `?tax_unique`. This function can be especially useful for when looking at alpha diversity (local richness). Let's apply it to our dataset by filtering it to genus-level. wzxhzdk:31 We can compare this to the genera from the original dataset to see how many genera were added by using `tax_unique()`: wzxhzdk:32 wzxhzdk:33 wzxhzdk:34 wzxhzdk:35 As you can see we've added two taxa that otherwise would have been ignored! To see more specifically how the function has performed, take a look at row 718, which records an indeterminate occurrence of the family *Dicynodontidae*. This occurrence now has a unique name of 'Dicynodontidae indet.' wzxhzdk:36 wzxhzdk:37 Now, let's use this function to extract the unique genera for each collection (=locality) with `group_apply`: wzxhzdk:38 wzxhzdk:39 Next, let's count the number of unique genera per collection: wzxhzdk:40 wzxhzdk:41 To plot an alpha diversity (local richness) plot, we will need to add interval age data to these collections: wzxhzdk:42 wzxhzdk:43 Done. Now to plot alpha diversity through time! Let's start with a simple scatterplot: wzxhzdk:44

plot of chunk alpha-div-plot-1

That looks good, but it's a bit difficult to figure out what stages contain which data points. `axis_geo()` adds a geological timescale between the plot and the axis. Taking our plot from above, let's use this function to add a timescale to the x-axis: wzxhzdk:45

plot of chunk alpha-div-plot-2

Amazing! Now we can clearly see that alpha diversity appears to be a lot higher in the Permian than the Carboniferous. But how has the spatial distribution of tetrapods changed across this boundary? Let's use some of the other `palaeoverse` functions to find out! # Exploring range size/spatial issues through time Another factor that we can examine is how range size of taxa changed across the C/P boundary. We can examine this spatial distribution through time using the `tax_range_space` function, which provides several different methods for estimating the distribution of taxonomic occurrences. For now we'll use the method `con` which calculates the geographic range of taxa using a convex hull, but the rest of the methods are described in detail in the function documentation which can be accessed via `?tax_range_space`. We can use this in combination with `group_apply` to calculate the average for each time interval for our dataset: wzxhzdk:46 wzxhzdk:47 As you can see, the combination of `group_apply` and `tax_range_space` produces a dataset with the area of geographic range for each taxon in kilometers squared for each time interval. Note that an area of zero here means that there is only one occurrence of that taxon in that time bin. From this information we can now plot how the average range size of tetrapods changed throughout the Carboniferous and Permian: wzxhzdk:48

plot of chunk alpha-div-plot-3

From this plot it seems like the average geographic range size of tetrapods increased during the Permian. However, we first need to be sure we're not picking up on another pattern. Remember how we calculated the number of unique geographic cells in each time interval earlier? This provides a measure of the spatial spread of the fossil record. It might be that we're not picking up on an increased range size but just a larger geographic spread of fossil bearing rocks within the Permian. To see if this is the case, we can run a simple correlation test between the average geographic range size of tetrapods and the number of unique grid cells in each time interval: wzxhzdk:49 wzxhzdk:50 The high rho value and a p-value less than our cut off for significance (0.05) indicate a strong and statistically significant correlation between average geographic range size and the spatial spread of data. Seems like this work is far from over... ...but we'll leave the rest to you! Hopefully this vignette has shown you the potential uses for `palaeoverse` functions and helped provide a workflow for your own analyses. If you have any questions about the package or its functionality, please feel free to join our [Palaeoverse Google group](https://groups.google.com/g/palaeoverse) and leave a question; we'll aim to answer it as soon as possible! # References Dunne, E.M., Close, R.A., Button, D.J., Brocklehurst, N., Cashmore, D.D., Lloyd, G.T. and Butler, R.J., 2018. Diversity change during the rise of tetrapods and the impact of the ‘Carboniferous rainforest collapse’. Proceedings of the Royal Society B: Biological Sciences, 285(1872), 20172730.