Plotting a stratigraphic column with rmacrostrat

Authors: Palaeoverse Development Team

Last updated: 2024-10-17

# Introduction `rmacrostrat` is an R package which allows users to easily retrieve geological data from the [Macrostrat](https://macrostrat.org) database and facilitates analyses of these data within the R environment. This vignette (or tutorial, if you prefer) is provided to guide you through the installation process and some of the functionality available within `rmacrostrat`. Specifically, we will focus on obtaining and plotting a geologic column, containing stratigraphic and lithological information for the San Juan Basin, located in the southwestern United States. # Installation The `rmacrostrat` package can be installed via CRAN, or its dedicated [GitHub repository](https://github.com/palaeoverse/rmacrostrat) if the development version is preferred. To install via CRAN, simply use: wzxhzdk:0 To install the development version, first install the `devtools` package, and then use `install_github` to install `rmacrostrat` directly from GitHub. wzxhzdk:1 You can now load `rmacrostrat` using the default `library` function, alongside some packages we will need for plotting, namely `ggplot2`, `ggrepel`, and `deeptime` (don't forget to install these packages if you don't have them already!): wzxhzdk:2 **Before we get into the good stuff, the development team has a small request**. If you use `rmacrostrat` in your research, please cite the associated publication. This will help us to continue our work in supporting you to do yours. You can access the appropriate citation via: wzxhzdk:3 wzxhzdk:4 # Context The San Juan Basin is a large structural depression which spans parts of New Mexico, Colorado, Utah, and Arizona. It is renowned for its oil and natural gas reserves, but it is also well-known for its late Cretaceous dinosaurs. In this vignette, we will investigate the geologic attributes of the rocks of the San Juan Basin and use this information to plot a stratigraphic column. In order to do this, we will make use of the `rmacrostrat` package to fetch data from the [Macrostrat](https://macrostrat.org) database. # Retrieving data Our first step is to search for geologic columns named "San Juan Basin" and retrieve some basic information about them, using the `def_columns` function: wzxhzdk:5 wzxhzdk:6 From this call, we can see there is a single column named "San Juan Basin". We are also given some information about this column, such as its geographic location (in the form of a latitude and longitude), the area over which it spans in km^2^, its thickness in meters, and the number of geologic units it contains (`t_units`). We are also given its `col_id`, 489, which we can use to retrieve more information through other functions in `rmacrostrat`. We will do this now, specifically using `get_columns`: wzxhzdk:7 # Exploring data We now have much more information relating to the San Juan Basin column. For example, we can look at the age range it spans: wzxhzdk:8 wzxhzdk:9 wzxhzdk:10 wzxhzdk:11 wzxhzdk:12 wzxhzdk:13 wzxhzdk:14 wzxhzdk:15 We can see that our column spans from 2050 to 32 million years ago, from the Orosirian (Paleoproterozoic) to the Rupelian (Cenozoic). Let's take a look at the mix of lithologies contained in the column: wzxhzdk:16 wzxhzdk:17 So our column contains a total of 25 different lithologies, including sedimentary, metamorphic, and igneous rocks. We can quickly visualize the proportion of the column made up of these different rocks, colored by their class: wzxhzdk:18
plot of chunk lith_plot

plot of chunk lith_plot

It seems the San Juan Basin is dominated by sedimentary rocks. Perhaps this is not surprising given that we know it is famed for fossils and energy resources! Speaking of these, let's take a look at the column's economic attributes: wzxhzdk:19 wzxhzdk:20 Here we can see that not only does the San Juan Basin contain coal, oil, and natural gas, but it also has some uranium ore and aquifers. We can also see the number of [Paleobiology Database](https://paleobiodb.org) collections linked to the San Juan Basin in our column information: wzxhzdk:21 wzxhzdk:22 To view more information about these fossil data, we need to use a different `rmacrostrat` function, `get_fossils`: wzxhzdk:23 This table contains information about each of the ~650 [Paleobiology Database](https://paleobiodb.org) collections associated with the San Juan Basin, including their ages and the number of occurrences they contain. Let's find out how many fossil occurrences we have in total: wzxhzdk:24 wzxhzdk:25 So over 2000 fossils are known from this single basin. We can also visualize the temporal distribution of these fossil collections, using the midpoint of their age ranges: wzxhzdk:26
plot of chunk fossil_plot

plot of chunk fossil_plot

From this plot, we can see that the San Juan Basin fossils range in age from the Permian to the Paleogene, but we have the highest concentration of fossil collections around the K-Pg boundary. Now that we have explored the data in the [Macrostrat](https://macrostrat.org) database on the San Juan Basin, it is time to plot our stratigraphic column. # Plotting the stratigraphic column To plot the stratigraphic column, we will need to obtain data for each lithological unit contained within the San Juan Basin column. We will do this using another `rmacrostrat` function, `get_units`, and referencing the `column_id` for the San Juan Basin. To keep our column plot contained, we will limit it to geological units which are Cretaceous in age: wzxhzdk:27 wzxhzdk:28 wzxhzdk:29 wzxhzdk:30 We now have information for each of the 17 Cretaceous geologic units contained within the San Juan Basin, including the age of the top and bottom of each, which is what we will use to plot our stratigraphic column. To reiterate, the y-axis on our plot is going to be time rather than height or thickness, so any unconformities present in the column will be evident. We can start out very simply, by using `geom_rect` in `ggplot2` to plot a rectangle corresponding to the age range of each unit in the section. wzxhzdk:31
plot of chunk column_a

plot of chunk column_a

We can already see something that roughly resembles a stratigraphic column. One thing to notice here is that we seem to have some overlap between our units, resulting in a darker shade of gray. We can take a closer look at this by dodging the units horizontally. wzxhzdk:32
plot of chunk column_b

plot of chunk column_b

Indeed, there are two units that overlap with each other: the Gallup Sandstone and the Upper Shale Member of the Mancos Shale. We can make these units plot next to each other by adding columns to our dataframe which define the x-axis values. wzxhzdk:33
plot of chunk column_c

plot of chunk column_c

However, there is a lot we can do to improve the aesthetics of our plot. For example, the column named `color` in our dataframe specifies the hexadecimal color corresponding to the dominant lithology of the unit. We can use this to color-code the units by lithology. wzxhzdk:34
plot of chunk column_d

plot of chunk column_d

Great! Now let's add labels indicating the names of the different units. wzxhzdk:35
plot of chunk column_e

plot of chunk column_e

Oh, we've just noticed that some of the unit names seem to have some mistakes and do not match the [USGS Geolex](https://ngmdb.usgs.gov/Geolex/search). Let's go ahead and update those names. wzxhzdk:36 And finally, we can add a column along the y-axis indicating the different stages of the Cretaceous, using the R package `deeptime`: wzxhzdk:37
plot of chunk column_f

plot of chunk column_f

Hopefully this vignette has shown you the potential uses for `rmacrostrat` functions and helped provide a workflow for your own analyses. If you have any questions about the package or its functionality, please feel free to join our [Palaeoverse Google group](https://groups.google.com/g/palaeoverse) and leave a question; we'll aim to answer it as soon as possible! If you're interested in learning more about `rmacrostrat`, don't forget to check out our other vignettes! You can see which ones are available by calling `vignette(package = "rmacrostrat")`. # References Gearty, W. 2024. deeptime: Plotting Tools for Anyone Working in Deep Time. R package version 1.1.1, . Jones, L.A., Gearty, W., Allen, B.J., Eichenseer, K., Dean, C.D., Galván S., Kouvari, M., Godoy, P.L., Nicholl, C.S.C., Dillon, E.M., Flannery-Sutherland, J.T., Chiarenza, A.A. 2022. palaeoverse: A community-driven R package to support palaeobiological analysis. *Methods in Ecology and Evolution*, 14(9), 2205--2215. doi: 10.1111/2041-210X.14099. Wickham, H. 2016. ggplot2: Elegant Graphics for Data Analysis. *Springer-Verlag New York*.


Try the rmacrostrat package in your browser

Any scripts or data that you put into this service are public.

rmacrostrat documentation built on Oct. 18, 2024, 5:10 p.m.