The earthlife R package is a wrapper for the shared API of the Neotoma Paleoecological Database (http://neotomadb.org) and the Paleobiology Database (http://paleodb.org). Each individual resource maintains a unique API system, and each has associated tools that makes use of their respective API system (e.g., the neotoma R package for Neotoma, the paleodb package for the Paleobiology Database). The earthlife package specifically binds the joint API, and provides basic functionality to search for biological data from the present to the origin of life on earth.

Resource Overview

Paleobiology Database

Neotoma

API Overview

The API serving both databases operates at the level of the occurrence. An occurrence is the presence of an organism at a location at some point in time. Philosophically and practically, the Paleobiology Database treats its data holdings in this way, while Neotoma generally considers organisms, or their organs (e.g., pollen) to be part of a broader analysis unit. This is a function of the history of the database, which arose primarily from the North American Pollen Database [REF], where pollen microfossils are processed from lake sediment or peat as a unit, and counted together as an assemblage at some depth, which can be then represented in time using geochronological markers.

The API itself works at the level of either unique occurrence or site IDs, taxon names, or bounding boxes. Searches can be further limited using time bounds, or services (either Neotoma or the Paleobiology Database, or both). All searches use the same base occs service.

Implementation

Class occurrence

The occurrence class is the workhorse of the earthlife package. It consists of the API response, parsed into a data.frame, and a set of associaed metadata. Recent interest in data provenance has led us to include metadata for the search key sufficient to generate a reproducible citation for the search itself. This uses the standard citation method, usually applied to packages to obtain their citation. While citing the specific search package is not neccessary, it is useful to retain information about the specific search and the access date. The citation method for occurrence data provides a simple citation style so it can be copy-pasted into a document, and retains the core information neccessary for reproducibility.

# The function `get_by_taxon` will be described more fully later.  We want to return an
# `occurrence` object fo illustrate the various methods available:

basic_occ <- get_by_taxon('Poaceae', lower = TRUE)
str(basic_occ)

Methods

print
basic_occ

The print method returns a summary of the occurrence information, with a sample subset of data, the total number of unique taxa returned and the age range of the entre dataset. The complete data table can be obtained using basic_occ[[1]] or ...

citation
citation(basic_occ)

The citation method returns information about the provenance of the dataset within the occurrence object. This includes a formatted citation and BibTeX output for the search, but, primarily contains the URL of the search string sent to the joint API, and the date the search was undertaken.

plot
plot(basic_occ)

The plot method outputs both a plot of the density of sample occurrences through time, but also a map of the spatial distribution of sites. The plotting method uses the maps package to place a mapped background over the sample points.

Key Functions:

get_occurrence

get_by_taxon

get_by_age

get_by_bbox

Helper Functions

bind

It may be the case that a user wishes to combine the results of two seperate searches. This method binds the output within the records section of the occurrence object, but it also provides support for preserving the data provenance by combining meta information from both objects. This results in the generation of a unique key for tagging the individual occurrences and two or more citations for the record. The plot method in this case will return graduated symbols for each component of the broader search, and the print method will indicate that the results were obtained from multiple searches.

Duplicate occurrences are preserved, but the clear_dups function can be used to remove duplicates from the occurrence based on a set of stated rules.

clear_dups



EarthLifeConsortium/earthlife documentation built on May 6, 2019, 3:10 p.m.