knitr::opts_chunk$set(message = FALSE, warning = FALSE, fig.width = 7)
In a previous tutorial, we have illustrated how to use the EPDr package to work with data of single entities (e.g., sediment cores) in the European Pollen Database (EPD). However and as in many other databases from international efforts, the spatial extent of the EPD makes it suitable for integrative studies where data from multiple entities can be analyzed together. EPDr
does not provide specific objects to store such data, but this can be easily accomplished in R using list
objects and functions in the lapply
family. EPDr
does, however, provide some functions to work on list of epd.entity.df
objects. Here, we ilustrate how to combine EPDr
with list
objects and lapply
functions, and how to use the EPDr
functions that work on lists of objects.
The very first thing is to load the library and stablish a valid connection to the database server. Remember that this need to be done for each session we start, and that it automatically closes the connection if we re-start R session and/or Rstudio.
epd.connection <- connect_to_epd(database = "epd", user = "epdr", password = "epdrpw")
epd.connection <- connect_to_epd(database = "epd", user = "epdr", password = "epdrpw", host = "rabbot19.uco.es")
Then, you need to identify the ids for the entities of interest (i.e., e_
). Here, the list_
functions, and specially list_e
, are your friends. For instance, if you want to work in a specific country. First, you can check which countries are represented in the EPD.
list_countries(epd.connection)
Once you know the exact spelling of the country names of interest, you can pass them as an argument to the list_e
function to get the list of all entities ids that fall into your study area.
e_ids <- list_e(epd.connection, country = c("Spain", "Portugal")) e_ids <- e_ids$e_
Note that
list_
functions return a data.frame (table) from where we only need the entities ids. This is the reason we subset in the second line to extract just the numbers in thee_
column.
Alternatively, you could filter entities by any other criteria accepted by list_e
check ?list_e
for further details. On the contrary, if you were interested to work with the whole EPD, you could just query the whole entities ids without passing any searching criteria to the function, which would return all entities.
# e_ids <- list_e(epd.connection) # e_ids <- e_ids$e_
Note that we did not run this example, and just provide it to acknowledge the posibility.
Now, you can use the lapply
function combined with get_entity
to receive a list of epd.entity
objects for all the entities of interest.
epd_all <- lapply(e_ids, get_entity, epd.connection) class(epd_all) class(epd_all[[1]])
Now, we can use the entity_to_matrices
function to calculate the epd.entity.df
objects from the list of epd.entity
objects.
epd_all <- lapply(epd_all, entity_to_matrices) class(epd_all[[1]])
Because each entity may have been analyzed by different researcher (with different taxonomical/morphological criteria, interests, etc.), these data need to be standardized before any analysis can be carried out.
EPDr
provides several functions that work on list of EPDr
objects (epd.entity
or epd.entity.df
) that facilitate that standardization. For instance, you might decide to not use entities with some restrictions on their use. The remove_restricted
function look into all the objects in a list and return only those that are unrestricted.
length(epd_all) vapply(epd_all, check_restriction, FUN.VALUE=logical(1)) epd_all <- remove_restricted(epd_all) length(epd_all)
Note that we have lost five entities, that have restricted the use of the data.
Usually, you will also remove all the entities that have no ages for the biological counts. You can use here the remove_wo_ages
function.
vapply(epd_all, check_defaultchron, FUN.VALUE=logical(1)) epd_all <- remove_wo_ages(epd_all) length(epd_all)
Note that we have removed another 22 entities for which there are no ages.
Now, you may need to know what are exactly the entities you have lost in these filters. You can use the extract_e
and vapply
functions to get a list of the entities id numbers.
vapply(epd_all, extract_e, FUN.VALUE=numeric(1))
Becasue, Giesecke et al. [-@giesecke_2013] provides updated chronologies for a lot of entities, we want to be sure to use those chronologies when they are available. To do so, we need to change the default chronology in each object to 9999
(specific number for giesecke) when available.
epd_all <- lapply(epd_all, giesecke_default_chron)
Next, let assume we are interested in pollen data. Here, we need to combine again filter_taxagroups
function with lapply
.
epd_all <- lapply(epd_all, filter_taxagroups, c("HERB", "TRSH", "DWAR", "LIAN", "HEMI", "UPHE", "INUN"))
Taxagroups represented here are all different type of pollen from plants of different life-forms: HERB for herbs, TRSH for trees and shrubs, DWARF for dwarf shrubs, LIAN for lianas, HEMI for hemiparasitic, and UPHE for upland herbs, and INUN for indeterminables and unknowns.
Now, we can change each taxa name in the objects for the accepted name according to the EPD, using the combination of the lapply
and taxa_to_accepted
functions. You could also modify the taxonomy to the higher taxonomical level using taxa_to_highertaxa
.
epd.taxonomy <- get_taxonomy_epd(epd.connection) epd_all <- lapply(epd_all, taxa_to_acceptedtaxa, epd.taxonomy)
Finally, we want to all entities reflecting the same taxa. unify_taxonomy
work on list of epd.entity.df
objects. It looks for the taxa in all the objects and increment all datasets to reflect the same taxa in all. Taxa present in one entity but not in others are included with value = 0
.
epd_all <- unify_taxonomy(epd_all, epd.taxonomy)
The following would be to transform raw pollen counts into pollen percentages. Here, again, we combine the lapply
function with the counts_to_percentages
function.
epd_all <- lapply(epd_all, counts_to_percentages)
Because each entity in our list will have samples at different ages, if we need to run an analysis with sites refering to the same time period, we need to standardize the ages across all epd.entity.df
objects in the list. Here, you can use lapply
combined with interpolate_counts
or intervals_counts
depending on the objectives of your study. Here, we will illustrate the use of interpolate_counts
.
epd_all <- lapply(epd_all, interpolate_counts, seq(0, 22000, by = 1000)) epd_all[[2]]@commdf@counts[, 1:7]
Note that here we request to get a pollen concentration for every 500 years since 22000 to 0 calibrated years BP. For those entities with data partially covering this range, the data falling outside of the range are filled with
NA
to acknowledge that this periods are outside of the real data range.
Counting data have at least one source of uncertainty associated: the age estimated for each sample comes from an age-depth model fit using control data (e.g., radiocarbon data or any other geochronological information). In the case of interpolated data, the uncertainty have a second dimension, depending on how far (in time) we are interpolating countings from a real sample. [@blois_2011] proposed an index to summarize these two uncertainties. See ?blois_quality
and Blois et al. [-@blois_2011] for details. EPDr
provides a function to calculate data quality according to the Blois index (blois_quality
) on epd.entity.df
objects.
epd_all[[1]]@agesdf@dataquality epd_all <- lapply(epd_all, blois_quality) epd_all[[1]]@agesdf@dataquality
The basic interpretation of the index is that closer to 1 is high quality data, whereas close to 0 is poor quality data.
When all standardizations have been accomplished, the data can be prepared for further analysis by preparing tables or can be plotted in the form of maps, for instance.
In the first version of the package, EPDr
have a function called table_by_taxa_age
that allow to search in a epd.entity.df
object and extract the data for a particular taxa and sample (by the sample label).
epd_tables <- lapply(epd_all, table_by_taxa_age, c("Quercus"), c("1000", "2000")) epd_tables[[1]]
Note that this function returns a list of tables with the entity number, coordinates for that entity (londd and latdd), count, sample_label and taxa. You can combine all those tables into a single one, by
epd_table <- do.call(rbind, epd_tables) epd_table
This function is also used internally in the map_taxa_age
function, which use the data prepared in this way to plot a map of the pollen counts for a particular taxa in a particular time period. In this case, the map_taxa_age
was designed to work on list of epd.entity.df
objects. Passing the list of objects, the desired taxa name, and the desired age label of interest will produce a ggplot
map with pollen counts.
map_taxa_age(epd_all, "Pinus", "1000")
Note that the function automatically recognize that counts are in percentages, and properly acknoledge that in the legend. If the counts were not converted into percentages, the function will plot differently to accomodate this sort of data. Note also how entities without data for that particular time period are plotted with NA (small dark grey dots).
While other functions ignore multiple taxa names or entity ids, the map_taxa_age
function accept multiple taxa names, but not sample labels. When multiple taxa names are provided, the plot represent the sum of all those taxa passed to the function. This can be usefull when taxonomy has not been standardized or we are interested in data at higher taxonomical levels than those recognized by the EPD taxonomy.
pinus <- c("Pinus", "Pinus cembra-type", "Pinus halepensis-type", "Pinus pinaster", "Pinus sylvestris-type") map_taxa_age(epd_all, pinus, "1000")
Note how some entities show now a higher value when putting all Pinus species in the same map.
The function also allow for 'on the fly' conversion to presence absence maps, by using a threshold to transform pollen counts (or percentages) into presence-absence data.
map_taxa_age(epd_all, pinus, "1000", pres_abse = T, pollen_thres = 0)
Note how the legend of the map acknoledge the use and value of a threshold. Here, the default value is zero.
map_taxa_age(epd_all, pinus, "1000", pres_abse = T, pollen_thres = 1)
Note the differences in the map when changing the pollen threshold from the default (zero) to 1, and how this change is acknoledge in the legend.
The function allows for many other configuration on dots colors, sizes, etc. It also allow to zoom into particular areas using coordinates. For a full description on all these map tunning arguments check ?map_taxa_age
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.