knitr::opts_chunk$set(
  fig.path = "img/",
  comment = "#>",
  warning = FALSE,
  message = FALSE
)

Introduction to pangaear

pangaear is a data retrieval interface for the World Data Center PANGAEA (https://www.pangaea.de/). PANGAEA archieves published Earth & Environmental Science data under the following subjects: agriculture, atmosphere, biological classification, biosphere, chemistry, cryosphere, ecology, fisheries, geophysics, human dimensions, lakes & rives, land surface, lithosphere, oceans, and paleontology.

Installation

If you've not installed it yet, install from CRAN:

install.packages("pangaear")

Or the development version:

devtools::install_github("ropensci/pangaear")

Load pangaear

library("pangaear")

Search for data

pg_search is a thin wrapper around the GUI search interface on the page https://www.pangaea.de/. Everything you can do there, you can do here.

For example, query for the term 'water', with a bounding box, and return only three results.

pg_search(query = 'water', bbox = c(-124.2, 41.8, -116.8, 46.1), count = 3)

The resulting data.frame has details about different studies, and you can use the DOIs (Digital Object Identifiers) to get data and metadata for any studies you're interested in.

Another search option

There's another search option with the pg_search_es function. It is an interface to the Pangaea Elasticsearch interface. This provides a very flexible interface for search Pangaea data - though it is different from what you're used to with the Pangaea website.

(res <- pg_search_es())

The returned data.frame has a lot of columns. You can limit columns returned with the source parameter.

There are attributes on the data.frame that give you the total number of results found as well as the max score found.

attributes(res)
attr(res, "total")
attr(res, "max_score")

To get to the DOIs for each study, use

gsub("https://doi.org/", "", res$`_source.URI`)

Get data

The function pg_data fetches datasets for studies by their DOIs.

res <- pg_data(doi = '10.1594/PANGAEA.807580')
res[[1]]

Search for data then pass one or more DOIs to the pg_data function.

res <- pg_search(query = 'water', bbox = c(-124.2, 41.8, -116.8, 46.1), count = 3)
pg_data(res$doi[3])[1:3]

OAI-PMH metadata

OAI-PMH is a standard protocol for serving metadata around objects, in this case datasets. If you are already familiar with OAI-PMH you are in luck as you can can use what you know here. If not familiar, it's relatively straight-forward.

Note that you can't get data through these functions, rather only metadata about datasets.

Identify the service

pg_identify()

List metadata formats

pg_list_metadata_formats()

List identifiers

pg_list_identifiers(from = Sys.Date() - 2, until = Sys.Date())

List sets

pg_list_sets()

List records

pg_list_records(from = Sys.Date() - 1, until = Sys.Date())

Get a record

pg_get_record(identifier = "oai:pangaea.de:doi:10.1594/PANGAEA.788382")


ropensci/pangaear documentation built on Nov. 18, 2022, 5:34 p.m.