Quantifying the geological completeness of paleontological sampling in North America

Authors: Palaeoverse Development Team

Last updated: 2024-10-17

# Introduction `rmacrostrat` is an R package which allows users to easily retrieve geological data from the [Macrostrat](https://macrostrat.org) database and facilitates analyses of these data within the R environment. This vignette (or tutorial, if you prefer) is provided to guide you through the installation process and some of the functionality available within `rmacrostrat`. Specifically, we will focus on reproducing a classical figure (i.e., Fig. 14) from [Peters & Heim (2010)](https://www.jstor.org/stable/25609444), which visualizes the changing number--and proportion--of sedimentary lithostratigraphic units from different paleoenvironments throughout the Phanerozoic. Let's get started! # Installation The `rmacrostrat` package can be installed via CRAN, or its dedicated [GitHub repository](https://github.com/palaeoverse/rmacrostrat) if the development version is preferred. To install via CRAN, simply use: wzxhzdk:0 To install the development version, first install the `devtools` package, and then use `install_github` to install `rmacrostrat` directly from GitHub. wzxhzdk:1 You can now load `rmacrostrat` using the standard `library` function: wzxhzdk:2 **Before we get into the good stuff, the development team has a small request**. If you use `rmacrostrat` in your research, please cite the associated publication. This will help us to continue our work in supporting you to do yours. You can access the appropriate citation via: wzxhzdk:3 wzxhzdk:4 # Context Quantifying the spatiotemporal distribution of sedimentary lithostratigraphic units is fundamental to understanding how environments, and the biodiversity within them, have evolved throughout Earth's history. Previous work by [Peters & Heim (2010)](https://www.jstor.org/stable/25609444) focused on *"the geological completeness of paleontological sampling in North America"*, with the aim of recognizing and overcoming geologically-controlled sampling biases. One component of this article involved quantifying how the number and proportion of lithostratigraphic units preserving different paleoenvironment types (e.g., marine, marginal, terrestrial) varies through time. In this vignette, we will revisit this question and examine how the number and proportion of different sedimentary lithostratigraphic units--grouped by paleoenvironment type--changes over the Phanerozoic. To do so, we will make use of the `rmacrostrat` package to fetch data from the [Macrostrat](https://macrostrat.org) database. # Retrieving data To quantify how the number and proportion of different lithostratigraphic units change through time, we will focus on the following paleoenvironmental groupings: marine, marginal, and terrestrial. The first step in establishing these groupings is to see what different environments are actually available in [Macrostrat](https://macrostrat.org): *Hint: Remember that all definitions of data stored in [Macrostrat](https://macrostrat.org) are available via the `def_` suite of functions.* wzxhzdk:5 wzxhzdk:6 From this call, we can see a whole suite of different inferred paleoenvironments are available, which are conveniently grouped into a type (e.g., carbonate, fluvial, lacustrine) and class (i.e., marine and non-marine). Class seems helpful for discerning between marine and non-marine environments, but we also want a marginal category, so we need to re-classify some of these environment definitions. We can define which paleoenvironments are marginal using the additional information available in the `name` column of `environments`. For this definition, let's consider that any environment that is classified as a delta, beach, barrier island, estuary, lagoon, or tidal flat is a marginal environment. From our available environments (*n* = 87), we can identify 22 as marginal environments. This classification is perhaps a little subjective, but for this vignette it provides a fair representation of marginal environments: wzxhzdk:7 Now that we've defined our marginal paleoenvironments, we can re-classify the `class` column in `environments` so that we have three categories. We can also replace the label "non-marine" with "terrestrial" for clarity: wzxhzdk:8 For simplicity, we will create three separate `data.frame`s, one for each paleoenvironment. wzxhzdk:9 Great! So we now have our groupings, but if we want to investigate how lithostratigraphic units change through time, we need a little more information. Specifically, we need to know the inferred paleoenvironment and age of each lithostratigraphic unit. We can retrieve such information about lithostratigraphic units using the `get_units` function. Conveniently, we can request such data for each paleoenvironmental class using the `environ_id` column. *Hint: Remember that to retrieve data stored in [Macrostrat](https://macrostrat.org) (e.g., Macrostrat columns, sections, and units), the `get_` suite of functions must be used.* wzxhzdk:10 Nice! That was pretty straightforward. However, some `unit_id` values in the `marine` dataframe are also present in the marginal and terrestrial dataframes, and vice versa: wzxhzdk:11 wzxhzdk:12 On further investigation, we can see that some lithostratigraphic units are associated with multiple different paleoenvironments (various combinations of marine, marginal, and terrestrial). Interesting, but perhaps to be expected given transgression and regression cycles! We should treat these differently to our pre-existing groupings for clarity. Let's make a new paleoenvironment class called "mixed", and update the environment class accordingly. wzxhzdk:13 Great, we can now see that we don't have any duplicate units in our dataset anymore: wzxhzdk:14 wzxhzdk:15 OK - we now have four lithostratigraphic groupings for our units, but Figure 14 from [Peters & Heim (2010)](https://www.jstor.org/stable/25609444) includes a fifth unit group, "unknown". For the sake of completeness, let's get all sedimentary lithostratigraphic units not already covered by our other four groupings (marine, marginal, mixed, and terrestrial). As we cannot search [Macrostrat](https://macrostrat.org) by "unknown", we will first pull all Phanerozoic sedimentary units, and then filter out units already present in our `litho_units` dataset. wzxhzdk:16 wzxhzdk:17 Wow, updates to Macrostrat have led to no sedimentary units having an "unknown" paleoenvironment - great! But we've just remembered that [Macrostrat](https://macrostrat.org) doesn't only cover North America, and we've pulled all units available worldwide. To make a closer reproduction of Figure 14 from [Peters & Heim (2010)](https://www.jstor.org/stable/25609444), we ought to just focus on North America. Conveniently, [Macrostrat](https://macrostrat.org) is split into different projects, which tend to cover different regions: wzxhzdk:18 wzxhzdk:19 From this call, we can see that North America has a `project_id` of 1. We can use this information to further filter `litho_units`: wzxhzdk:20 Now we've got our data and done a bit of wrangling, let's get on with summarizing and visualizing it. # Summarizing data We are interested in how the number and proportion of lithostratigraphic units representing different paleoenvironment classes changes through time. Conveniently, lithostratigraphic units in [Macrostrat](https://macrostrat.org) have age information associated with them, specifically the minimum (`t_age`) and maximum (`b_age`) age in millions of years before present. Using this data, with some handy support functions from the `palaeoverse` R package, we can count the number of lithographic units within each paleoenvironmental group, and within each time bin. We will use international stratigraphic stage bins as our time bins for this: wzxhzdk:21 # Visualizing data Now we have our summary of the lithostratigraphic units, we can visualize this data. First, we will plot the number of units within each paleoenvironment group through time. To support this data visualization, we will make use of the `ggplot2` (plotting) and `deeptime` (adding a geological timescale axis) R packages. wzxhzdk:22
plot of chunk visualize_counts

plot of chunk visualize_counts

We can also plot the proportion of units within each paleoenvironment group through time. wzxhzdk:23
plot of chunk visualize_proportions

plot of chunk visualize_proportions

And that's it! Not too bad, right? There are some clear differences between our generated plots and those from the original [Peters & Heim (2010)](https://www.jstor.org/stable/25609444) paper (Fig. 14), which is to be expected given that substantially more data is now available in [Macrostrat](https://macrostrat.org) (e.g., no "unknown" paleoenvironments), and there may be differences in what we defined as "marginal" environments. Despite these differences, some broad-scale patterns are largely consistent, such as the higher proportion of terrestrial lithostratigraphic units in the Mesozoic and Cenozoic, compared to the Paleozoic. # Bonus plot: rock classes We can use very similar code to explore many different geological questions. For example, what if we were interested in the temporal distribution of igneous and metamorphic rocks, as well as sedimentary? Below we extract and plot data showing exactly this. wzxhzdk:24
plot of chunk class_proportions

plot of chunk class_proportions

Hopefully this vignette has shown you some potential uses for `rmacrostrat` functions and helped provide a workflow for your own analyses. If you have any questions about the package or its functionality, please feel free to join our [Palaeoverse Google group](https://groups.google.com/g/palaeoverse) and leave a comment; we'll aim to answer it as soon as possible! If you're interested in learning more about `rmacrostrat`, don't forget to check out our other vignettes! You can see which ones are available by calling `vignette(package = "rmacrostrat")`. # References Gearty, W. 2024. deeptime: Plotting Tools for Anyone Working in Deep Time. R package version 1.1.1, . Jones, L.A., Gearty, W., Allen, B.J., Eichenseer, K., Dean, C.D., Galván S., Kouvari, M., Godoy, P.L., Nicholl, C.S.C., Dillon, E.M., Flannery-Sutherland, J.T., Chiarenza, A.A. 2022. palaeoverse: A community-driven R package to support palaeobiological analysis. *Methods in Ecology and Evolution*, 14(9), 2205--2215. doi: 10.1111/2041-210X.14099. Peters, S.E. and Heim, N.A. 2010. The Geological Completeness of Paleontological Sampling in North America. *Paleobiology* 36(10), pp61--79. Wickham, H. 2016. ggplot2: Elegant Graphics for Data Analysis. *Springer-Verlag New York*.


Try the rmacrostrat package in your browser

Any scripts or data that you put into this service are public.

rmacrostrat documentation built on Oct. 18, 2024, 5:10 p.m.