High Level Introduction

Paleovegetation reconstruction is important for understanding Earth system proceses, in particular, reliable models of paleovegetation that have well defined uncertainty and are spatially explicit can be used to constrain 'slow' Earth system processes.

STEPPS represents a new generation of spatio-temporal Bayesian models for vegetation reconstruction from pollen. It uses information from a network of pollen sites, calibrated against vegetation data to estimate relevant parameters needed to constrain pollen production, transport and deposition across the landscape, which makes spatially structured predictions possible.

![Figure?]

Data Preparation

As mentioned above, we require two key data products: (1) pollen data from sedimentary archives, and (2) some form of gridded (or grid-able) vegetation data that is co-located with the pollen data. The data should have a common spatial projection (preferably with an isotropic coordinate system, so that unit steps are equidistance in all directions across the region of interest).

We will present here data from two sources. Pollen data will be obtained from the Neotoma Paleoecological Database using the R package neotoma (Goring &al, 2015). Vegetation data will be obtained from the supplemental material of Paciorek et al (). This data is the product of a conditional-autoregressive model the spatially smooths Pre-EuroAmerican Settlement forest cover data from the Upper Midwestern United States (Goring et al., 2016) and the Eastern United States (Cogbill). This data comes in a netCDF format and is very large. A gist shows how to download this data and do some preliminary processing, but this vignette assumes that the vegetation data is gridded, with both mean cell values and standard deviations of the posterior draws from each cell & each taxon.

library(stepps)
library(dplyr)
library(DT)
library(readr)
library(tidyr)

data(plss_vegetation)
plss_vegetation %>% 
  gather(key = 'Taxon',
         value = 'proportion',
         gathercols = colnames(plss_vegetation)[5:ncol(plss_vegetation)]) %>% 
  group_by(Taxon) %>% 
  summarise( mean = round(mean(proportion[proportion > 0.01]), 2),
              max = round(max(proportion[proportion > 0.01]), 2),
            cells = round(sum(proportion > 0.01) / nrow(plss_vegetation), 2)) %>% 
  DT::datatable(options = list(dom='t', pageLength=30),
                colnames = c("Tree Taxa", "Mean", "Maximum", "Presence"))

We see one of several things here. There are r ncol(plss_vegetation) taxa, values sum to 1, and, it's posible to note, some of these taxa do not have direct, or clear, equivalents in the pollen data.

Choosing a bounding box

When we examine the Pre-Settlement data (PSD) we see that it is projected in a Great Lake St Lawrence Albers projection (+init=espg:3175). Neotoma data can be queried using a bounding box, but only using lat/long coordinates with WGS84 datum (+init=epsg:4376).

We are going to extract the bounding box from the veg data and query Neotoma. The bbox_tran() helper function is supplied in the stepps package to facilitate easy access to Neotoma's pollen data.

pol_box <- bbox_tran(plss_vegetation, '~ x + y',
                             '+init=epsg:3175', 
                             '+init=epsg:4326')

veg_box <- bbox_tran(plss_vegetation, '~ x + y',
                             '+init=epsg:3175', 
                             '+init=epsg:3175')

reconst_grid <- build_grid(veg_box, resolution = 8000, proj = '+init=epsg:3175')

Obtaining Pollen Data

We can then use the neotoma package's get_dataset() command to locate all pollen data within the bounding box of the vegetation data. It is possible to access more pollen data with other parameters, but we assume here the user is only intereseted in the pollen with the highest degree of overlap.

datasets <- neotoma::get_dataset(loc = pol_box, datasettype = 'pollen')

neotoma::plot_leaflet(datasets)

This gives us an interactive map in R, with information about individual sites. The map uses aggregate markers, which explains why points here are of different colors, regardless it is possible to see the utility of the bounding box in defining the area of interest for this query. If we were to download the pollen data for each of these sites we could further investigate rates of change, and pollen proportions within this region.

Spatial & Taxonomic Cleaning

We need to download the pollen data, and do the taxonomic cleaning.

# downloads <- neotoma::get_download(datasets)

# This was added as a data object, based on data accessed April, 19, 2018
# --sjg

data(downloads)
# This may take a long time if run, the package has included the response
# as a data object to be called using:
# data(downloads)

downloads <- neotoma::get_download(datasets)

We want to get the samples in the right age bin. This function extracts pollen data from the Neotoma download object and returns the results in a data.frame. You can use your own data.frame if you aren't using Neotoma, but the subsequent functions expect certain column names, in particular:

calib_dialect <- pull_timebins(downloads, calib_range = c(150, 350), aggregate = TRUE)
head(calib_dialect[!duplicated(calib_dialect$site.name),])[,1:10] %>% 
  select(-`dataset`) %>% 
  knitr::kable()

These 10 columns are important. They indicate the age-model type, which may or may not be critical for your predictions (see Dawson et al., in prep), they indicate site location and time period, and they provide links (through the .id field) for functions in the neotoma package, to obtain more information about the underlying data.

Taxonomic Standardization

Ensuring that pollen and vegetation data represent comparable taxonomic units is one of the most critical elements. The pollen data contains r ncol(calib_dialect) - 10 unique pollen morphtypes, while the vegetation includes r ncol(plss_vegetation) - 4. While there an be a one-to-one relationship for some taxa, it's important to recognize that there is a hierarchy to both pollen and vegetation taxa that can be taken into account to help relate the two datasets.

To align the records a user must create a translation table that can align vegetation and pollen. In this case we will put it into a data/translation folder, so we can keep track of things. Make sure this folder exists in your directory structure if you wish to put files there.

generate_tables(plss_vegetation, output = 'output/veg_trans.csv')
generate_tables(downloads, output = 'ouput/pol_trans.csv')

These tables need to be edited by hand to represent the taxa that the user intends to reconstruct. For example, one might edit:

pol_table <- system.file("extdata", "pol_trans.csv", package="stepps") %>% 
 readr::read_csv()
pol_table %>% 
  dplyr::select(-variable.element, -variable.context, -alias) %>% 
  distinct %>% 
  DT::datatable(options = list(pageLength=10))

To assign genera-level assignments to trees and shrubs that can be obtained from the vegetation data as well:

pol_table <- system.file("extdata", "pol_trans_edited.csv", package="stepps") %>% 
  readr::read_csv()

pol_table %>% 
  filter(ecological.group == "TRSH") %>% 
  dplyr::select(-variable.element, -variable.context, -alias) %>% 
  DT::datatable()

So we've assigned "Acer" to the pollen morphotaxon "Acer", and also to "Acer negundo". Now we need to transform the data so that the new table reflects the new assignments:

calib_trans <- translate_taxa(calib_dialect, 
                              pol_table,
                              id_cols = colnames(calib_dialect)[1:10])

We do the same with the vegetation data, taking care to ensure that all the vegetation assignments are provided in the pollen data, and vice versa. Similarly, the vegetation data can be edited

veg_table <- system.file("extdata", "veg_trans.csv", package="stepps") %>% 
  readr::read_csv()

veg_table %>% DT::datatable()

The STEPPS function requires that the taxon components of the vegetation and pollen data are equivalent. We need the unique vector of all vegetation/taxa we're using as our targets:

veg_table <- system.file("extdata", "veg_trans_edited.csv", package="stepps") %>% 
  readr::read_csv()

target_taxa <- na.omit(unique(veg_table$match))

veg_trans <- translate_taxa(plss_vegetation, 
                            veg_table,
                            id_cols = colnames(plss_vegetation)[1:4])

To run the proper model there are a number of elements that are specifically required by the model. Some of these are relatively straightforward, for example, the STEPPS model requires the value $K$, for the number of taxa. To simplify the process for the end user we have constructed the prep_input() function. This internalizes many of the procedures and outputs a list that can be directly exported to STAN:

veg_table <- to_stepps_shape(veg_trans,   '~ x + y',      '+init=epsg:3175')
pol_table <- to_stepps_shape(calib_trans, '~ long + lat', '+init=epsg:4326')

stepps_input <- prep_input(veg    = veg_table, 
                 pollen = pol_table, 
                 target_taxa = target_taxa,
                 grid   = reconst_grid)

We need to make the specific objects:



PalEON-Project/stepps-cal documentation built on May 14, 2019, 8:20 a.m.